Helmholtz Gemeinschaft


Semi-automated assembly of high-quality diploid human reference genomes

PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
[img] Other (Supplementary Information)

Item Type:Article
Title:Semi-automated assembly of high-quality diploid human reference genomes
Creators Name:Jarvis, E.D. and Formenti, G. and Rhie, A. and Guarracino, A. and Yang, C. and Wood, J. and Tracey, A. and Thibaud-Nissen, F. and Vollger, M.R. and Porubsky, D. and Cheng, H. and Asri, M. and Logsdon, G.A. and Carnevali, P. and Chaisson, M.J.P. and Chin, C.S. and Cody, S. and Collins, J. and Ebert, P. and Escalona, M. and Fedrigo, O. and Fulton, R.S. and Fulton, L.L. and Garg, S. and Gerton, J.L. and Ghurye, J. and Granat, A. and Green, R.E. and Harvey, W. and Hasenfeld, P. and Hastie, A. and Haukness, M. and Jaeger, E.B. and Jain, M. and Kirsche, M. and Kolmogorov, M. and Korbel, J.O. and Koren, S. and Korlach, J. and Lee, J. and Li, D. and Lindsay, T. and Lucas, J. and Luo, F. and Marschall, T. and Mitchell, M.W. and McDaniel, J. and Nie, F. and Olsen, H.E. and Olson, N.D. and Pesout, T. and Potapova, T. and Puiu, D. and Regier, A. and Ruan, J. and Salzberg, S.L. and Sanders, A.D. and Schatz, M.C. and Schmitt, A. and Schneider, V.A. and Selvaraj, S. and Shafin, K. and Shumate, A. and Stitziel, N.O. and Stober, C. and Torrance, J. and Wagner, J. and Wang, J. and Wenger, A. and Xiao, C. and Zimin, A.V. and Zhang, G. and Wang, T. and Li, H. and Garrison, E. and Haussler, D. and Hall, I. and Zook, J.M. and Eichler, E.E. and Phillippy, A.M. and Paten, B. and Howe, K. and Miga, K.H.
Abstract:The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Keywords:Chromosome Mapping, Diploidy, DNA Sequence Analysis, Human Genome, Haplotypes, High-Throughput Nucleotide Sequencing, Pregnancy
Publisher:Nature Publishing Group
Page Range:519-531
Date:17 November 2022
Official Publication:https://doi.org/10.1038/s41586-022-05325-5
PubMed:View item in PubMed

Repository Staff Only: item control page


Downloads per month over past year

Open Access
MDC Library