Search
Browse
Statistics
Feeds

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

[thumbnail of Original Article]
Preview
PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
5MB
[thumbnail of Supplementary Information] Other (Supplementary Information)
2MB

Item Type:Article
Title:Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
Creators: Porubsky, D. ORCID logoORCID: https://orcid.org/0000-0001-8414-8966, Ebert, P. ORCID logoORCID: https://orcid.org/0000-0001-7441-532X, Audano, P.A., Vollger, M.R. ORCID logoORCID: https://orcid.org/0000-0002-8651-1615, Harvey, W.T., Marijon, P. ORCID logoORCID: https://orcid.org/0000-0002-6694-6873, Ebler, J., Munson, K.M. ORCID logoORCID: https://orcid.org/0000-0001-8413-6498, Sorensen, M., Sulovari, A. ORCID logoORCID: https://orcid.org/0000-0003-4354-9020, Haukness, M. ORCID logoORCID: https://orcid.org/0000-0001-9991-8089, Ghareghani, M., Lansdorp, P.M., Paten, B., Devine, S.E., Sanders, A.D. ORCID logoORCID: https://orcid.org/0000-0003-3945-0677, Lee, C., Chaisson, M.J.P., Korbel, J.O. ORCID logoORCID: https://orcid.org/0000-0002-2798-3794, Eichler, E.E. ORCID logoORCID: https://orcid.org/0000-0002-8246-4014 and Marschall, T. ORCID logoORCID: https://orcid.org/0000-0002-9376-1030
Abstract:Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Keywords:Algorithms, DNA Sequence Analysis, Haplotypes, High-Throughput Nucleotide Sequencing, Human Genome, Parents, Puerto Rico, Single-Cell Analysis
Source:Nature Biotechnology
ISSN:1087-0156
Publisher:Nature Publishing Group
Volume:39
Number:3
Page Range:302-308
Date:March 2021
Official Publication:https://doi.org/10.1038/s41587-020-0719-5
PubMed:View item in PubMed

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library