Digital Life Norway - "Towards Perfect de novo DNA Sequencing"

With the advent of long read sequencers such as the PacBio RS II, the goal of near-perfect de novo reconstructions of unknown genomes is once again a realistic possibility. We will explain why, and further give a hypothesis as to why assemblers have improved only marginally since the era of the Human Genome Project circa 2000.

With the advent of long read sequencers such as the PacBio RS II, the goal of near-perfect de novo reconstructions of unknown genomes is once again a realistic possibility.  We will explain why, and further give a hypothesis as to why assemblers have improved only marginally since the era of the Human Genome Project circa 2000, namely that it is not about the assembly, but about the artifacts in the reads and the resolution of repeat families, topics that have not received sufficient attention and that are particularly critical issues for long reads.

Therefore we are developing algorithms that carefully analyze a long read shotgun data set before assembly. By efficiently comparing all the data against itself we have developed a computational approach to accurately determine the quality of any stretch of a PacBio read based only on the sequence data itself.  These regional QVs allow us to  accurately identify low quality regions, chimers, and missed adaptamers.  Removing these artifacts with a process we call scrubbing leaves one with reads that assemble without the need for base-level error correction.  We further find that we can identify and annotate repetitive sequences prior to assembly, albeit this aspect is still a work in progress.

We will conclude with a number of sequencing projects we are undertaking and on describing what assembly tools are currently available from our lab.

About Dr. Myers

Dr Myers is director and tschira chair of Systems Biology, Max Planck Institute for Molecular Cell Biology and Genetics.  He is best known for the development of BLAST -- the most widely used tool in bioinformatics, and for the paired-end whole genome shotgun sequencing protocol and the assembler he developed at Celera that delivered the fly, human, and mouse genomes in a three year period.

Published Apr. 13, 2016 9:12 AM - Last modified Apr. 13, 2016 12:51 PM