Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

Computational pan-genomics: status, promises and challenges

[thumbnail of Original Article]
Preview
PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB
[thumbnail of Supplementary Data]
Preview
PDF (Supplementary Data) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
64kB

Item Type:Article
Title:Computational pan-genomics: status, promises and challenges
Abstract:Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Keywords:Pan-Genome, Sequence Graph, Read Mapping, Haplotypes, Data Structures
Source:Briefings in Bioinformatics
ISSN:1467-5463
Publisher:Oxford University Press
Volume:19
Number:1
Page Range:118-135
Date:January 2018
Additional Information:Ashley D. Sanders is a member of the Computational Pan-Genomics Consortium. The Consortium formed at a workshop held from 8 to 12 June 2015, at the Lorentz Center in Leiden, the Netherlands, withthe purpose of providing a cross-disciplinary overview of the emerging discipline of Computational Pan-Genomics. The workshop was organized by Victor Guryev, Tobias Marschall, Alexander Schönhuth (chair), Fabio Vandin, and Kai Ye. Consortium members are listed at the end of this article. Corresponding author: Tobias Marschall, Center for Bioinformatics at Saarland University and Max Planck Institute for Informatics, Saarland InformaticsCampus, Saarbrücken, Germany. Tel.: +49 681 302 70880; E-mail:t.marschall@mpi-inf.mpg.de
Official Publication:https://doi.org/10.1093/bib/bbw089
External Fulltext:View full text on PubMed Central
PubMed:View item in PubMed

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library