Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

Identifying single copy orthologs in Metazoa

[thumbnail of 13190oa.pdf] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Item Type:Article
Title:Identifying single copy orthologs in Metazoa
Creators Name:Creevey, C.J., Muller, J., Doerks, T., Thompson, J.D., Arendt, D. and Bork, P.
Abstract:The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.
Keywords:Expressed Sequence Tags, Gene Dosage, Genetic Databases, Genome, Genomics, Molecular Evolution, Multigene Family, Phylogeny, Animals
Source:PLoS Computational Biology
ISSN:1553-734X
Publisher:Public Library of Science
Volume:7
Number:12
Page Range:e1002269
Date:1 December 2011
Official Publication:https://doi.org/10.1371/journal.pcbi.1002269
PubMed:View item in PubMed

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library