Motif discovery

Motif discovery in DNA

During my PhD I cooperated closely with Finn Drabløs, as well as Osman Abul, Kjetil Klepper, Jostein Johansen, Vegard Walseng, Øystein Bø Syrstad, Lars Eidsheim and Magnus Nedland on different projects. I also discussed a lot of interesting issues with Arne Halaas, Rolv Seehuus and Magnus Lie, though we never wrote any articles together.

I wrote a survey of motif discovery in DNA together with Finn Drabløs, where we described a formal mathematical model of the motif discovery process, and placed the current literature (around 100 methods) according to this model. Although this allowed us to precisely place the existing methods, we realized it was still very difficult to say anything about which methods performed best. We thus developed two new benchmarks. First, I developed a set of benchmarks for the discovery of single motifs where we distinguished between modelling motifs as sensitively as possible and finding the best instances according to standard motif models. Second, I contributed to a benchmark for the discovery of cis-regulatory modules, which we constructed based on co-occurring binding sites as found in the TRANSFAC database.

I developed a discretized method Compo and contributed to a probabilistic method Baycis for the discovery of cis-regulatory modules in DNA (in addition to an early article on a method GCMD for composite motif discovery in proteins). In addition to this, I contributed to articles on iterated motif discovery in setting with available gene expression data, on controlling the false discovery rate in motif discovery settings, and on a two-step single motif discovery method. The Compo method was later applied for motif discovery in an allergy-setting.

Published Aug. 1, 2017 10:25 PM - Last modified Aug. 1, 2017 10:25 PM