Item Type: | Article |
---|---|
Title: | mGene: Accurate SVM-based gene finding with an application to nematode genomes |
Creators Name: | Schweikert, G., Zien, A., Zeller, G., Behr, J., Dieterich, C., Ong, C.S., Philips, P., De Bona, F., Hartmann, L., Bohlen, A., Krueger, N., Sonnenburg, S. and Raetsch, G. |
Abstract: | We present the highly accurate gene prediction system mGene, which in an unprecedented manner combines the flexibility of generalized hidden Markov models with the predictive power of modern machine learning methods. Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene prediction tasks. The fully developed version shows superior performance in ten out of twelve evaluation criteria compared to the other gene finders, including Fgenesh and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that 2,200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica and C. remanei. They allow us to compare the resulting proteomes among these organisms and to the known protein universe, thereby identifying many species-specific gene inventions. In an assessment of the quality of several available annotations for these genomes, we find that mGene's predictions are most accurate. |
Keywords: | Algorithms, Artificial Intelligence, Caenorhabditis elegans, Computational Biology, Helminth Genes, Helminth Genome, Genomics, RNA Splice Sites, Reproducibility of Results, Reverse Transcriptase Polymerase Chain Reaction, DNA Sequence Analysis, Transcription Initiation Site, Animals |
Source: | Genome Research |
ISSN: | 1088-9051 |
Publisher: | Cold Spring Harbor Laboratory Press |
Volume: | 19 |
Number: | 11 |
Page Range: | 2133-2143 |
Date: | November 2009 |
Official Publication: | https://doi.org/10.1101/gr.090597.108 |
PubMed: | View item in PubMed |
Repository Staff Only: item control page