Title:
Techniques to improve genome assembly quality

dc.contributor.advisor Aluru, Srinivas
dc.contributor.author Nihalani, Rahul
dc.contributor.committeeMember Vuduc, Richard
dc.contributor.committeeMember Jordan, King
dc.contributor.committeeMember Wang, May Dongmei
dc.contributor.committeeMember Catalyurek, Umit V.
dc.contributor.department Computational Science and Engineering
dc.date.accessioned 2019-05-29T14:03:44Z
dc.date.available 2019-05-29T14:03:44Z
dc.date.created 2019-05
dc.date.issued 2019-03-28
dc.date.submitted May 2019
dc.date.updated 2019-05-29T14:03:44Z
dc.description.abstract De-novo genome assembly is an important problem in the field of genomics. Discovering and analysing genomes of different species has numerous applications. For humans, it can lead to early detection of disease traits and timely prevention of diseases like cancer. In addition, it is useful in discovering genomes of unknown species. Even though it has received enormous attention in the last couple of decades, the problem remains unsolved to a satisfactory level, as shown in various scientific studies. Paired-end sequencing is a technology that sequences pairs of short strands from a genome, called reads. The pairs of reads originate from nearby genomic locations, and are commonly used to help more accurately determine the genomic location of individual reads and resolve repeats in genome assembly. In this thesis, we describe the genome assembly problem, and the key challenges involved in solving it. We discuss related work where we describe the two most popular models to approach the problem: de-Bruijn graphs and overlap graphs, along with their pros and cons. We describe our proposed techniques to improve the quality of genome assembly. Our main contribution in this work is designing a de-Bruijn graph based assembly algorithm to effectively utilize paired reads to improve genome assembly quality. We also discuss how our algorithm tackles some of the key challenges involved in genome assembly. We adapt this algorithm to design a parallel strategy to obtain high quality assembly for large datasets such as rice within reasonable time-frame. In addition, we describe our work on probabilistically estimating overlap graphs for large short reads datasets. We discuss the results obtained for our work, and conclude with some future work.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/61272
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Genome assembly
dc.subject High performance computing
dc.title Techniques to improve genome assembly quality
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computational Science and Engineering
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 01ab2ef1-c6da-49c9-be98-fbd1d840d2b1
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
NIHALANI-DISSERTATION-2019.pdf
Size:
875.76 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: