Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

CNSistent integration and feature extraction from somatic copy number profiles

[thumbnail of Original Article]
Preview
PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB
[thumbnail of Supplementary Data] Other (Supplementary Data)
23MB

Item Type:Article
Title:CNSistent integration and feature extraction from somatic copy number profiles
Creators Name:Streck, Adam and Schwarz, Roland F.
Abstract:BACKGROUND: Most cancers exhibit somatic copy number alterations (SCNAs)—gains and losses of variable regions of DNA. SCNAs play a key role in cancer adaptation through modulation of gene expression, deletion of tumor suppressor genes, or amplification of oncogenes. Systematic analysis of SCNAs is now a routine task in both the clinic and research and can help identify novel cancer genes, improve our understanding of cancer gene regulation, and enable us to accurately reconstruct cancer phylogenies. However, to conduct such analyses, SCNA profiles have to be integrated between samples, patients, and cohorts—often a nontrivial task, for which dedicated toolkits are lacking. RESULTS: To fill this gap, we developed CNSistent, a Python package for imputation, filtering, consistent segmentation, feature extraction, and visualization of cancer copy number profiles from heterogeneous datasets. We demonstrate the utility of CNSistent by applying it to the following publicly available cohorts: The Cancer Genome Atlas, Pan-Cancer Analysis of Whole Genomes, and TRAcking Cancer Evolution through therapy (Rx). We compare the effect of sample preprocessing and different segmentation and aggregation strategies on cancer type and subtype classification tasks using various classification models. We also evaluate how well a classifier trained on one cohort generalizes to another. Lastly, we introduce 2 segment-based peak and outlier scores to investigate relationships between segments, between samples, and between cancer types. Using these scores, we investigate non–small cell lung cancer samples, highlighting that SOX2 amplification is the dominant copy number alteration in lung squamous cell carcinoma and the main distinction to lung adenocarcinoma. CONCLUSIONS: CNSistent is a general-purpose toolkit for integrated processing of SCNA profiles across many patients and cohorts. It is available at https://bitbucket.org/schwarzlab/cnsistent. The Research Resource Identifier for CNSistent is SCR_027025.
Keywords:Cancer, Data Processing, SCNA, Deep Learning, Cancer Classification
Source:GigaScience
ISSN:2047-217X
Publisher:Oxford University Press
Volume:14
Page Range:giaf104
Date:13 September 2025
Official Publication:https://doi.org/10.1093/gigascience/giaf104
PubMed:View item in PubMed
Related to:

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library