Reproduction of genome correction software

Thumbnail Image
Thadani, Arav Kishore
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
The primary issues with genome sequencing machines today are insertion, substitution, and deletion. These have led to the rise of genome correction software which use different algorithms to correct errors in the sequence. The purpose of this study is to test around 12 of the most popular genome correcting software and see how the results that we obtain compare to the results that are reported. We use Nextflow as the pipeline software and Docker containers so that the environment remains constant and can be replicated by anyone after us to see our results. Each testing case started off with a Docker container where we preinstall the correction software along with indexing software. Then we move on to the Nextflow template that consists of the datasets that we will be testing. The next section is the primary indexing followed by running the actual correction software on the dataset. Lastly, we have to do another round of indexing and then final measure results by running a script which tells us how many well the software ran. The testing programs are usually custom python scripts that output in the format provided in the correction software’s paper. We have published a website which feature all the results that we have found. Within the website, results are divided up by software. Within each software, one can see the results we found next to the results that are published and the discrepancy between the two.
Date Issued
Resource Type
Resource Subtype
Undergraduate Thesis
Rights Statement
Rights URI