Towards optimal assembly for high throughput shotgun sequencing
Thu., Nov 15, 2012
3:00 - 4:00 PM
400 Cory Hall
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. During the last two decades, many assembly algorithms have been proposed, but comparing and evaluating them is difficult. To clarify this, we ask: Given N reads of length L sampled from an arbitrary DNA sequence, is it possible to achieve some target probability 1-eps of successful reconstruction? We show that the answer depends on the repeat statistics of the DNA sequence to be assembled, and we compute these statistics for a number of reference genomes. We construct lower bounds showing that reconstruction is impossible for certain choices of N and L, and complement this by analytically deriving the performance of several algorithms, both in terms of repeat statistics. In seeking an algorithm whose performance matches the lower bounds, on real DNA data, we are able to methodically progress towards an optimal assembly algorithm. The goal of this work is to advocate a new approach to the design of assembly algorithms based on an information theoretic criterion.

Bio: Guy Bresler is a PhD candidate in the Department of Electrical Engineering and Computer Sciences at UC Berkeley, advised by David Tse. His research interests include information theory, applied probability, and computational biology. He received the B.S. degree in electrical and computer engineering and the M.S. degree in mathematics from the University of Illinois at Urbana-Champaign, both in 2006. Guy is the recipient of an NSF Graduate Research Fellowship, a Vodafone Graduate Fellowship, the Barry M. Goldwater Scholarship, a Vodafone Undergraduate Scholarship, the E.C. Jordan Award from the ECE department at UIUC, and the Roberto Padovani Scholarship from Qualcomm.
UC Berkeley Networking
Ashwin Pananjady and Orhan Ocal
Last Modification Date: Wednesday, February 10, 2016