Evaluating the protein coding potential of exonized transposable element sequences

Piriyapongsa, Jittima; Rutledge, Mark T.; Patel, Sanil; Borodovsky, Mark; Jordan, I. King

Title:

Evaluating the protein coding potential of exonized transposable element sequences

dc.contributor.author	Piriyapongsa, Jittima	en_US
dc.contributor.author	Rutledge, Mark T.	en_US
dc.contributor.author	Patel, Sanil	en_US
dc.contributor.author	Borodovsky, Mark	en_US
dc.contributor.author	Jordan, I. King	en_US
dc.contributor.corporatename	Georgia Institute of Technology. School of Biology	en_US
dc.contributor.corporatename	Georgia Institute of Technology. Dept. of Biomedical Engineering	en_US
dc.contributor.corporatename	Emory University. Dept. of Biomedical Engineering	en_US
dc.contributor.corporatename	Georgia Institute of Technology. Division of Computational Science and Engineering	en_US
dc.contributor.corporatename	Georgia Institute of Technology. College of Computing	en_US
dc.date.accessioned	2011-12-22T19:59:36Z
dc.date.available	2011-12-22T19:59:36Z
dc.date.issued	2007-11-26
dc.description	© 2007 Piriyapongsa et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.	en_US
dc.description	DOI: 10.1186/1745-6150-2-31	en_US
dc.description.abstract	Background: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results: We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion: The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence.	en_US
dc.identifier.citation	Piyapongsa, J., Rutledge, M.T., Patel, S., Borodovsky, M. and I.K. Jordan, 2007. Evaluating the protein coding potential of exonized transposable element sequences. Biol. Direct 2: 31	en_US
dc.identifier.doi	10.1186/1745-6150-2-31
dc.identifier.issn	1745-6150
dc.identifier.uri	http://hdl.handle.net/1853/42111
dc.language.iso	en_US	en_US
dc.publisher	Georgia Institute of Technology	en_US
dc.publisher.original	BioMed Central	en_US
dc.subject	Transposable elements	en_US
dc.subject	Post-transcriptional regulators	en_US
dc.subject	Gene expression	en_US
dc.subject	TEs	en_US
dc.subject	Gene evolution	en_US
dc.title	Evaluating the protein coding potential of exonized transposable element sequences	en_US
dc.type	Text
dc.type.genre	Article
dspace.entity.type	Publication
local.contributor.author	Jordan, I. King
local.contributor.author	Borodovsky, Mark
local.contributor.corporatename	College of Sciences
local.contributor.corporatename	School of Biological Sciences
relation.isAuthorOfPublication	1c155699-6f2d-418d-83cd-9e1424896d4f
relation.isAuthorOfPublication	fa975b84-f807-4cec-93a6-9df633afb791
relation.isOrgUnitOfPublication	85042be6-2d68-4e07-b384-e1f908fae48a
relation.isOrgUnitOfPublication	c8b3bd08-9989-40d3-afe3-e0ad8d5c72b5

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Jordan_BMC_2007_001.pdf
Size:: 2.29 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Research Publications

Title: Evaluating the protein coding potential of exonized transposable element sequences

Files

Original bundle

Collections

Title:

Evaluating the protein coding potential of exonized transposable element sequences