Genomic Analysis of Micro-inversions Based on High-Throughput Sequencing
Loading...
Author(s)
Qu, Li
Advisor(s)
Zhu, Huaiqiu
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Genomic structural variations (SVs) are generally defined to include insertions, deletions, duplications, translocations, copy number variations and inversions. As with other types of variations, inversions are of great significance for studying disease susceptibility, population diversity and human evolution. For the last few years, inversions have drawn increasing attention with the large amount of data produced by high-throughput sequencing. In this dissertation, we defined micro-inversions (MIs) as inversions with the length shorter than 100 bp and larger than 10 bp. Until now, there is still a lack of systematic analysis of MIs due to the following reason. The unmapped reads are usually completely discarded by previous SV detection tools and these unmapped reads may include SVs, mainly consisted of small-scale MIs, which could cause the reads to fail to map to the reference genome. Fortunately, the MI detection tool (MID) which used unmapped reads to detect MIs, was developed by our lab in recent years and made the MI analysis available. In this dissertation, we made a comprehensive systems biology analysis of MIs on both healthy genomes at the population level, and cancer genomes based on the MIs detected from high-throughput sequencing data. Specifically, we have accomplished the following two aspects of work:
(1) Based on the healthy individual genomes from the 1000 Genome Project (1KGP), we detected abundant MIs from 1,937 samples in 26 populations all over the world, built a landscape of MIs on non-disease individuals, and made a comparative analysis of MIs in non-human primate genomes from the University of California Santa Cruz (UCSC) Genome Browser Database. Specifically, we discovered 6,968 MIs in human individuals with MID and 24,476 MIs in non-human primates including chimpanzee, gorilla, orangutan, gibbon, baboon, rhesus monkey, and squirrel monkey with searchUMI tool. The MI results in human genomes showed that MIs were rarely located in exon regions and the gene density might affect the MI distribution among chromosomes. Among the five super-populations, African had the most MIs and East Asian had the least, which was consistent with previous research on single nucleotide polymorphism (SNP). Furthermore, the average MI number among the five super-populations were in linear relationship with the descending order: Africa > America > Europe > South Asia > East Asia, and this descending order also coincided with the “Out of Africa” hypothesis, which assumed that humans originated in Africa and migrated to other continents later. Besides, Africans had the most MIs in common with non-human primates, which also supported “Out of Africa” hypothesis. The results of phylogenetic tree and PCA not only met our expectation but also reflected a regional pattern among the 26 populations suggesting that ethnic groups that live geographically closest to one another have a relatively small MI genetic distance. In addition, the cluster of MIs in the human populations also coincided with human migration history and ancestral lineage. Thus, we proposed that MIs were potential evolutionary markers for investigating population dynamics. In general, we made a comprehensive analysis of MIs in human genomes and our results revealed the diversity of MIs in human populations and showed that they were related to evolution, environmental adaptation, and health. These MI results may further support for the analysis of human genome diversity and the construction of human evolutionary process.
(2) Base on the cancer genome data from sequence read archive (SRA) database, we further detected and analyzed the MIs of 451 samples in six cancers, including esophageal cancer, bladder cancer, hepatocellular carcinoma, lung cancer, prostate cancer, and pancreatic cancer. We also used the 1,937 healthy individuals from 1KGP as control samples. We first analyzed the distribution of MIs among chromosomes in genomes of six cancers. The results showed that there were both similarities and differences in MI distributions among chromosomes in different cancers. We also found that the MI number in cancers was much higher than that in healthy samples. Besides, prostate cancer had the most MIs and hepatocellular carcinoma had the least. Moreover, we analyzed the genes with frequent MIs in six cancers. It showed that the genes in which MIs frequently appeared in different cancers were specific, and many of these genes were closely related to cancers. In addition, we compared the MIs we detected with the SNPs reported previously and found that 132 SNPs overlapped with MIs. In summary, our analysis of MIs in six cancers showed that the number of MIs in different cancers, as well as chromosome and gene preference, were different. The divergent MIs among six cancers may provide help for personalized diagnosis and therapy of the six cancers in the future.
In conclusion, based on the high-throughput sequencing data, we focused on studying the small-scale micro-inversions (MIs), which have been ignored for a long time, and made a comprehensive bioinformatics analysis of large amounts of MIs in human genomes. The analysis of MIs in healthy individual genomes may improve our understanding of human genetic diversity and evolution. At the same time, through the comparative genomics analysis of MIs in different cancers, we hope to provide further understanding for precision medicine and revealing the disease mechanism
Sponsor
Date
2020-03-11
Extent
Resource Type
Text
Resource Subtype
Dissertation