Person:
Borodovsky, Mark

Associated Organization(s)
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 17
  • Item
    Identifying a Type of Genetic Code in an Anonymous, Prokaryotic DNA Sequence
    (Georgia Institute of Technology, 2020-01) Pfennig, Aaron ; Lomsadze, Alexander ; Borodovsky, Mark
    Here we present an ab-initio approach for predicting the genetic code of an anonymous prokaryotic DNA sequence. To the best of our knowledge it is the first tool of such kind. In times of metagenomics more and more non-cultivable species are sequenced coming with an increasing number of discoveries of alternations of the canonical genetic code. The Genetic Code Detector (GCD) delineated below is capable of identifying the genetic code of complete genomes with a sensitivity and specificity of 1.0. Furthermore, it performs well on contigs as small as 10Kbp with a specificity of 0.99 and a sensitivity of 0.92. Recently, the class of crAssphage has been discovered which show two different genetic codes. Hence, it is of interest to predict the position in the genome where the genetic codes changes. The presented GCD is capable to predict the switching point with a mean error of 0.53 genes and a standard deviation of 6.47 genes.
  • Item
    GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences
    (Georgia Institute of Technology, 2013) Antonov, Ivan ; Baranov, Pavel ; Borodovsky, Mark
    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech. edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (_1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).
  • Item
    TrueSight: a new algorithm for splice junction detection using RNA-seq
    (Georgia Institute of Technology, 2012) Li, Yang ; Li-Byarlay, Hongmei ; Burns, Paul ; Borodovsky, Mark ; Robinson, Gene E. ; Ma, Jian
    RNA-seq has proven to be a powerful technique for transcriptome profiling based on next-generation sequencing (NGS) technologies. However, due to the short length of NGS reads, it is challenging to accurately map RNA-seq reads to splice junctions (SJs), which is a critically important step in the analysis of alternative splicing (AS) and isoform construction. In this article, we describe a new method, called TrueSight, which for the first time combines RNA-seq read mapping quality and coding potential of genomic sequences into a unified model. The model is further utilized in a machine-learning approach to precisely identify SJs. Both simulations and real data evaluations showed that TrueSight achieved higher sensitivity and specificity than other methods. We applied TrueSight to new high coverage honey bee RNA-seq data to discover novel splice forms. We found that 60.3% of honey bee multi-exon genes are alternatively spliced. By utilizing gene models improved by TrueSight, we characterized AS types in honey bee transcriptome. We believe that TrueSight will be highly useful to comprehensively study the biology of alternative splicing.
  • Item
    The genome of the polar eukaryotic microalga coccomyxa subellipsoidea reveals traits of cold adaptation
    (Georgia Institute of Technology, 2012) Blanc, Guillaume ; Agarkova, Irina ; Grimwood, Jane ; Kuo, Alan ; Brueggeman, Andrew ; Dunigan, David D. ; Gurnon, James ; Ladunga, Istvan ; Lindquist, Erika ; Lucas, Susan ; Pangilinan, Jasmyn ; Pröschold, Thomas ; Salamov, Asaf ; Schmutz, Jeremy ; Weeks, Donald ; Yamada, Takashi ; Lomsadze, Alexandre ; Borodovsky, Mark ; Claverie, Jean-Michel ; Grigoriev, Igor V. ; Van Etten, James L.
    Background: Little is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced. Results: The 48.8 Mb genome contained in 20 chromosomes exhibits significant synteny conservation with the chromosomes of its relatives Chlorella variabilis and Chlamydomonas reinhardtii. The order of the genes is highly reshuffled within synteny blocks, suggesting that intra-chromosomal rearrangements were more prevalent than inter-chromosomal rearrangements. Remarkably, Zepp retrotransposons occur in clusters of nested elements with strictly one cluster per chromosome probably residing at the centromere. Several protein families overrepresented in C. subellipsoidae include proteins involved in lipid metabolism, transporters, cellulose synthases and short alcohol dehydrogenases. Conversely, C-169 lacks proteins that exist in all other sequenced chlorophytes, including components of the glycosyl phosphatidyl inositol anchoring system, pyruvate phosphate dikinase and the photosystem 1 reaction center subunit N (PsaN). Conclusions: We suggest that some of these gene losses and gains could have contributed to adaptation to low temperatures. Comparison of these genomic features with the adaptive strategies of psychrophilic microbes suggests that prokaryotes and eukaryotes followed comparable evolutionary routes to adapt to cold environments.
  • Item
    The Genome Sequence of the North-European Cucumber (Cucumis sativus L.) Unravels Evolutionary Adaptation Mechanisms in Plants
    (Georgia Institute of Technology, 2011) Wóycicki, Rafał ; Witkowicz, Justyna ; Gawroński, Piotr ; Dąbrowska, Joanna ; Lomsadze, Alexandre ; Pawełkowicz, Magdalena ; Siedlecka, Ewa ; Yagi, Kohei ; Pląder, Wojciech ; Seroczyńska, Anna ; Śmiech, Mieczysław ; Gutman, Wojciech ; Niemirowicz-Szczytt, Katarzyna ; Bartoszewski, Grzegorz ; Tagashira, Norikazu ; Hoshi, Yoshikazu ; Borodovsky, Mark ; Karpiński, Stanisław ; Malepszy, Stefan ; Przybecki, Zbigniew
    Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar – Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions.
  • Item
    Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi
    (Georgia Institute of Technology, 2011) Xu, Junhuan ; Linning, Rob ; Fellers, John ; Dickinson, Matthew ; Zhu, Wenhan ; Antonov, Ivan ; Joly, David L. ; Donaldson, Michael E. ; Eilam, Tamar ; Anikster, Yehoshua ; Banks, Travis ; Munro, Sarah ; Michael Mayo, ; Brian Wynhoven, ; Ali, Johar ; Richard Moore, ; McCallum, Brent ; Borodovsky, Mark ; Saville, Barry ; Bakkeren, Guus
    Background.Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results. To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar leaf rust Melampsora species, and the corn smut fungus, Ustilago maydis (Um). While extensive homologies were found, many genes appeared novel and species-specific; over 40% of genes did not match any known sequence in existing databases. Focusing on spore stages, direct comparison to Um identified potential functional homologs, possibly allowing heterologous functional analysis in that model fungus. Many potentially secreted protein genes were identified by similarity searches against genes and proteins of Pgt and Melampsora spp., revealing apparent orthologs. Conclusions. The current set of Pt unigenes contributes to gene discovery in this major cereal pathogen and will be invaluable for gene model verification in the genome sequence.
  • Item
    The Chlorella variabilis NC64A Genome Reveals Adaptation to Photosymbiosis, Coevolution with Viruses, and Cryptic Sex
    (Georgia Institute of Technology, 2010-09) Blanc, Guillaume ; Duncan, Garry ; Agarkova, Irina ; Borodovsky, Mark ; Gurnon, James ; Kuo, Ala ; Lindquist, Erika ; Lucas, Susan ; Pangilinan, Jasmyn ; Polle, Juergen ; Salamov, Asaf ; Terry, Astrid ; Yamada, Takashi ; Dunigan, David D. ; Grigoriev, Igor V. ; Claverie, Jean-Michel ; Van Etten, James L.
    Chlorella variabilis NC64A, a unicellular photosynthetic green alga (Trebouxiophyceae), is an intracellular photobiont of Paramecium bursaria and a model system for studying virus/algal interactions. We sequenced its 46-Mb nuclear genome, revealing an expansion of protein families that could have participated in adaptation to symbiosis. NC64A exhibits variations in GC content across its genome that correlate with global expression level, average intron size, and codon usage bias. Although Chlorella species have been assumed to be asexual and nonmotile, the NC64A genome encodes all the known meiosis-specific proteins and a subset of proteins found in flagella. We hypothesize that Chlorella might have retained a flagella-derived structure that could be involved in sexual reproduction. Furthermore, a survey of phytohormone pathways in chlorophyte algae identified algal orthologs of Arabidopsis thaliana genes involved in hormone biosynthesis and signaling, suggesting that these functions were established prior to the evolution of land plants. We show that the ability of Chlorella to produce chitinous cell walls likely resulted from the capture of metabolic genes by horizontal gene transfer from algal viruses, prokaryotes, or fungi. Analysis of the NC64A genome substantially advances our understanding of the green lineage evolution, including the genomic interplay with viruses and symbiosis between eukaryotes.
  • Item
    Bacillus anthracis genome organization in light of whole transcriptome sequencing
    (Georgia Institute of Technology, 2010) Martin, Jeffrey ; Zhu, Wenhan ; Passalacqua, Karla D. ; Bergman, Nicholas ; Borodovsky, Mark
    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.
  • Item
    Ab initio Gene Identification in Metagenomic Sequences
    (Georgia Institute of Technology, 2010) Zhu, Wenhan ; Lomsadze, Alexandre ; Borodovsky, Mark
    We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes
  • Item
    UBM: Quantitative systems biology
    (Georgia Institute of Technology, 8/31/2009) Borodovsky, Mark ; Choi, Jung H. ; Bunimovich, Leonid ; Godsztein, Guillermo H.