Haplotype-resolved diverse human genomes and integrated analysis of structural variation

Creators Name:Ebert, P. and Audano, P. A. and Zhu, Q. and Rodriguez-Martin, B. and Porubsky, D. and Bonder, M.J. and Sulovari, A. and Ebler, J. and Zhou, W. and Serra Mari, R. and Yilmaz, F. and Zhao, X. and Hsieh, P.H. and Lee, J. and Kumar, S. and Lin, J. and Rausch, T. and Chen, Y. and Ren, J. and Santamarina, M. and Höps, W. and Ashraf, H. and Chuang, N.T. and Yang, X. and Munson, K.M. and Lewis, A.P. and Fairley, S. and Tallon, L.J. and Clarke, W.E. and Basile, A.O. and Byrska-Bishop, M. and Corvelo, A. and Evani, U.S. and Lu, T.Y. and Chaisson, M.J.P. and Chen, J. and Li, C. and Brand, H. and Wenger, A.M. and Ghareghani, M. and Harvey, W.T. and Raeder, B. and Hasenfeld, P. and Regier, A.A. and Abel, H.J. and Hall, I.M. and Flicek, P. and Stegle, O. and Gerstein, M.B and Tubio, J.M.C. and Mu, Z. and Li, Y.I. and Shi, X. and Hastie, A.R. and Ye, K. and Chong, Z. and Sanders, A.D. and Zody, M.C. and Talkowski, M.E. and Mills, R. E. and Devine, S.E. and Lee, C. and Korbel, J.O. and Marschall, T. and Eichler, E.E.
Abstract:Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. We identify 107,590 structural variants (SVs), of which 68% are not discovered by short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterize 130 of the most active mobile element source elements and find that 63% of all SVs arise by homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Keywords:DNA Sequence Analysis, Genetic Variation, Genotype, Haplotypes, High-Throughput Nucleotide Sequencing, Human Genome, INDEL Mutation, Interspersed Repetitive Sequences, Population Groups, Quantitative Trait Loci, Retroelements, Sequence Inversion, Whole Genome Sequencing
Publisher:American Association for the Advancement of Science
Page Range:eabf7117
Date:25 February 2021
Official Publication:https://doi.org/10.1126/science.abf7117
PubMed:View item in PubMed

