Oppgaven er ikke lenger tilgjengelig

Using machine learning to make sense of our genome

We have developed a large open-source software system that allows researchers to perform statistical analyses of genomic data through a web interface (hyperbrowser.uio.no). The philosophy behind the system has been to determine a set of fundamental, generic components, which can subsequently be combined in exponentially many ways to allow a broad range of precise questions to be asked. The current system is based solely on a statistical testing paradigm, where users select hypothesis tests, statistical assumptions and so forth.


The task is about exploring the use of machine learning to analyze genome data. The goal would be to determine and build generic components, which as in the statistical paradigm can be combined to allow a broad range of analyses, but here focused on machine-learning. That is, instead of hypothesis testing, this functionality should allow learning how different data sets relate, predict how some property would be based on the values of other data sets, and so on. As several general machine-learning algorithms, such as neural networks, decision trees and support vector machines, are readily available, the focus of the task would be on how to build a generic system for analyzing diverse genome-related questions by machine learning methods.


As the HyperBrowser system is currently running with the statistical functionality, and has a reasonable amount of international users, it would be interesting to add machine learning-based beta-functionality as it is developed during the masters task. This could give interesting feedback from both internal and external users, and may also provide relevant case studies to develop further.


Students should be skilled in programming and have an interest for machine learning or mathematics in general. No prior knowledge of biology is needed.

Publisert 24. mai 2013 15:27 - Sist endret 26. feb. 2014 17:12

Veileder(e)

Omfang (studiepoeng)

60