Helmholtz Gemeinschaft


MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution

PDF (Preprint) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
PDF (Supplementary Material) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Item Type:Preprint
Title:MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution
Creators Name:Kaufmann, T.L. and Petkovic, M. and Watkins, T.B.K. and Colliver, E.C. and Laskina, S. and Thapa, N. and Minussi, D.C. and Navin, N. and Swanton, C. and Van Loo, P. and Haase, K. and Tarabichi, M. and Schwarz, R.F.
Abstract:Chromosomal instability (CIN) and somatic copy-number alterations (SCNA) play a key role in the evolutionary process that shapes cancer genomes. SCNAs comprise many classes of clinically relevant events, such as localised amplifications, gains, losses, loss-of-heterozygosity (LOH) events, and recently discovered parallel evolutionary events revealed by multi-sample phasing. These events frequently appear jointly with whole genome doubling (WGD), a transformative event in tumour evolution involving tetraploidization of genomes preceded or followed by individual chromosomal copy-number changes and associated with an overall increase in structural CIN. While SCNAs have been leveraged for phylogeny reconstruction in the past, existing methods do not take WGD events into account and cannot model parallel evolution. They frequently make use of the infinite sites assumption, do not model horizontal dependencies between adjacent genomic loci and can not infer ancestral genomes. Here we present MEDICC2, a new phylogeny inference algorithm for allele-specific SCNA data that addresses these shortcomings. MEDICC2 dispenses with the infinite sites assumption, models parallel evolution and accurately identifies clonal and subclonal WGD events. It times SCNAs relative to each other, quantifies SCNA burden in single-sample studies and infers phylogenetic trees and ancestral genomes in multi-sample or single-cell sequencing scenarios with thousands of cells. We demonstrate MEDICC2's ability on simulated data, real-world data of 2,778 single sample tumours from the Pan-cancer analysis of whole genomes (PCAWG), 10 bulk multi-region prostate cancer patients and two recent single-cell datasets of triple-negative breast cancer comprising several thousands of single cells.
Publisher:Cold Spring Harbor Laboratory Press
Article Number:2021.02.28.433227
Date:6 September 2021
Official Publication:https://doi.org/10.1101/2021.02.28.433227

Repository Staff Only: item control page


Downloads per month over past year

Open Access
MDC Library