Search
Browse
Statistics
Feeds

An expanded reference catalog of translated open reading frames for biomedical research

[thumbnail of Original Article]
Preview
PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB
[thumbnail of Supplementary Data] Other (Supplementary Data)
63MB

Item Type:Article
Title:An expanded reference catalog of translated open reading frames for biomedical research
Creators Name:Chothani, Sonia, Ruiz-Orera, Jorge, Tierney, Jack A.S., Swirski, Michal I., Tjeldnes, Hakon, Kok, Leron W., Clauwaert, Jim, Deutsch, Eric W., Alba, M. Mar, Aspden, Julie L., Baranov, Pavel V., Bazzini, Ariel Alejandro, Bruford, Elspeth A., Brunet, Marie A., Cardon, Tristan, Carvunis, Anne-Ruxandra, Casola, Claudio, Choudhary, Jyoti Sharma, Dean, Kellie, Faridi, Pouya, Fierro-Monti, Ivo, Fournier, Isabelle, Frankish, Adam, Gerstein, Mark, Hubner, Norbert, Jiang, Yunzhe, Kellis, Manolis, Martinez, Thomas F., Menschaert, Gerben, Ni, Pengyu, Orchard, Sandra, Roucou, Xavier, Rozowsky, Joel, Salzet, Michel, Siragusa, Mauro, Slavoff, Sarah, Ternette, Nicola, Vizcaino, Juan Antonio, Wacholder, Aaron, Wu, Wei, Xie, Zhi, Yang, Yucheng T., Moritz, Robert L., Valen, Eivind, Mudge, Jonathan, van Heesch, Sebastiaan, Prensner, John R. and Rackham, Owen J.L.
Abstract:Non-canonical (i.e. unannotated) open reading frames (ncORFs) have until recently been omitted from reference genome annotations, despite evidence of their translation, limiting their incorporation into biomedical research. To address this, in 2022, we initiated the TransCODE consortium and built the first community-driven consensus catalog of human ncORFs, which was openly distributed to the research community via Ensembl-GENCODE. While this catalog represented a starting point for reference ncORF annotation, major technical and scientific issues remained. In particular, this initial catalog had no standardized framework to judge the evidence of translation for individual ncORFs. Here, we present an expanded and refined catalog of the human reference annotation of ncORFs. By incorporating more datasets and by lifting constraints on ORF length and start codon, we define a comprehensive set of 28 359 ncORFs that is nearly four times the size of the previous catalog. Furthermore, to aid users who wish to work with ncORFs with the strongest and most reproducible signals of translation, we utilized a data-driven framework (i.e. translation signature scores) to assess the accumulated evidence for any individual ncORF. Using this approach, we derive a subset of 10 127 ncORFs with translation evidence on par with canonical protein-coding genes, which we refer to as the primary set. This set can serve as a reliable reference for downstream analyses and validation, with a particular emphasis on high quality. Overall, this update reflects continuous community-driven efforts to make ncORFs accessible and actionable to the broader research public, and further iterations of the catalog will continue to expand and refine this resource.
Keywords:Biomedical Research, Genetic Databases, Human Genome, Molecular Sequence Annotation, Open Reading Frames, Protein Biosynthesis
Source:Nucleic Acids Research
ISSN:0305-1048
Publisher:Oxford University Press
Volume:54
Number:6
Page Range:gkag234
Date:13 April 2026
Official Publication:https://doi.org/10.1093/nar/gkag234
PubMed:View item in PubMed
Related to:

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library