Preview |
PDF (Original Article)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB |
![]() |
Other (Supplementary Data)
23MB |
Item Type: | Article |
---|---|
Title: | CNSistent integration and feature extraction from somatic copy number profiles |
Creators Name: | Streck, Adam and Schwarz, Roland F. |
Abstract: | BACKGROUND: Most cancers exhibit somatic copy number alterations (SCNAs)—gains and losses of variable regions of DNA. SCNAs play a key role in cancer adaptation through modulation of gene expression, deletion of tumor suppressor genes, or amplification of oncogenes. Systematic analysis of SCNAs is now a routine task in both the clinic and research and can help identify novel cancer genes, improve our understanding of cancer gene regulation, and enable us to accurately reconstruct cancer phylogenies. However, to conduct such analyses, SCNA profiles have to be integrated between samples, patients, and cohorts—often a nontrivial task, for which dedicated toolkits are lacking. RESULTS: To fill this gap, we developed CNSistent, a Python package for imputation, filtering, consistent segmentation, feature extraction, and visualization of cancer copy number profiles from heterogeneous datasets. We demonstrate the utility of CNSistent by applying it to the following publicly available cohorts: The Cancer Genome Atlas, Pan-Cancer Analysis of Whole Genomes, and TRAcking Cancer Evolution through therapy (Rx). We compare the effect of sample preprocessing and different segmentation and aggregation strategies on cancer type and subtype classification tasks using various classification models. We also evaluate how well a classifier trained on one cohort generalizes to another. Lastly, we introduce 2 segment-based peak and outlier scores to investigate relationships between segments, between samples, and between cancer types. Using these scores, we investigate non–small cell lung cancer samples, highlighting that SOX2 amplification is the dominant copy number alteration in lung squamous cell carcinoma and the main distinction to lung adenocarcinoma. CONCLUSIONS: CNSistent is a general-purpose toolkit for integrated processing of SCNA profiles across many patients and cohorts. It is available at https://bitbucket.org/schwarzlab/cnsistent. The Research Resource Identifier for CNSistent is SCR_027025. |
Keywords: | Cancer, Data Processing, SCNA, Deep Learning, Cancer Classification |
Source: | GigaScience |
ISSN: | 2047-217X |
Publisher: | Oxford University Press |
Volume: | 14 |
Page Range: | giaf104 |
Date: | 13 September 2025 |
Official Publication: | https://doi.org/10.1093/gigascience/giaf104 |
PubMed: | View item in PubMed |
Related to: |
Repository Staff Only: item control page