Oppgaven er ikke lenger tilgjengelig

Optimizing backbones when using higher-order/interpreted languages

Interpreted languages like Python are typically from a factor ten to a factor hundred times slower than C when coding without special concern for efficiency. This performance gap can often be greatly reduced by finding a small part of the code that is the bottleneck, and then optimizing this for instance by using efficient vector operations or by writing a small bottleneck module in a compiled language like C. Alternatively, Cython or similar tools allows type declarations and other constructs facilitating code optimizations to be integrated directly alongside the standard interpreted Python code. Techniques like this make it easy to optimize critical computations in a small, localized part of the program execution. However, it is not this straightforward if profiling shows that most of the execution time is spent in backbone code that is tightly integrated with large proportion of the full code base, e.g. as base classes that most of the remaining code inherits from.

The task is to look at techniques for optimizing execution of interpreted code when the backbone code is the bottleneck, and evaluate to what degree optimization is possible without sacrificing the convenience and flexibility that is the reason for choosing a dynamic language in the first place. We have a large codebase for statistical analysis of genome data that can serve as case for the task. This codebase integrates tens of data formats, tens of statistical analysis algorithms and processes gigabytes of data in a single run.

Good programming skills are necessary. No prior knowledge of biology is needed.

Publisert 6. juni 2011 14:57

Veileder(e)

Omfang (studiepoeng)

60