Evaluating the protein coding potential of exonized transposable element sequences

dc.contributor.author Piriyapongsa, Jittima en_US
dc.contributor.author Rutledge, Mark T. en_US
dc.contributor.author Patel, Sanil en_US
dc.contributor.author Borodovsky, Mark en_US
dc.contributor.author Jordan, I. King en_US
dc.contributor.corporatename Georgia Institute of Technology. School of Biology en_US
dc.contributor.corporatename Georgia Institute of Technology. Dept. of Biomedical Engineering en_US
dc.contributor.corporatename Emory University. Dept. of Biomedical Engineering en_US
dc.contributor.corporatename Georgia Institute of Technology. Division of Computational Science and Engineering en_US
dc.contributor.corporatename Georgia Institute of Technology. College of Computing en_US
dc.date.accessioned 2011-12-22T19:59:36Z
dc.date.available 2011-12-22T19:59:36Z
dc.date.issued 2007-11-26
dc.description © 2007 Piriyapongsa et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. en_US
dc.description DOI: 10.1186/1745-6150-2-31 en_US
dc.description.abstract Background: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results: We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion: The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. en_US
dc.identifier.citation Piyapongsa, J., Rutledge, M.T., Patel, S., Borodovsky, M. and I.K. Jordan, 2007. Evaluating the protein coding potential of exonized transposable element sequences. Biol. Direct 2: 31 en_US
dc.identifier.doi 10.1186/1745-6150-2-31
dc.identifier.issn 1745-6150
dc.identifier.uri http://hdl.handle.net/1853/42111
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.publisher.original BioMed Central en_US
dc.subject Transposable elements en_US
dc.subject Post-transcriptional regulators en_US
dc.subject Gene expression en_US
dc.subject TEs en_US
dc.subject Gene evolution en_US
dc.title Evaluating the protein coding potential of exonized transposable element sequences en_US
dc.type Text
dc.type.genre Article
dspace.entity.type Publication
local.contributor.author Jordan, I. King
local.contributor.author Borodovsky, Mark
local.contributor.corporatename College of Sciences
local.contributor.corporatename School of Biological Sciences
relation.isAuthorOfPublication 1c155699-6f2d-418d-83cd-9e1424896d4f
relation.isAuthorOfPublication fa975b84-f807-4cec-93a6-9df633afb791
relation.isOrgUnitOfPublication 85042be6-2d68-4e07-b384-e1f908fae48a
relation.isOrgUnitOfPublication c8b3bd08-9989-40d3-afe3-e0ad8d5c72b5
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
2.29 MB
Adobe Portable Document Format