Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

Partitioned learning of deep Boltzmann machines for SNP data

Item Type:Article
Title:Partitioned learning of deep Boltzmann machines for SNP data
Creators Name:Hess, M. and Lenz, S. and Blätte, T.J. and Bullinger, L. and Binder, H.
Abstract:MOTIVATION: Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. RESULTS: After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen the joint distribution of SNPs, followed by training several DBMs on SNP partitions that were identified by the screening. Aggregate features representing SNP patterns and the corresponding SNPs are extracted from the DBMs by a combination of statistical tests and sparse regression. In simulated case–control data, we show how this can uncover complex SNP patterns and augment results from univariate approaches, while maintaining type 1 error control. Time-to-event endpoints are considered in an application with acute myeloid leukemia patients, where SNP patterns are modeled after a pre-screening based on gene expression data. The proposed approach identified three SNPs that seem to jointly influence survival in a validation dataset. This indicates the added value of jointly investigating SNPs compared to standard univariate analyses and makes partitioned learning of DBMs an interesting complementary approach when analyzing SNP data. AVAILABILITY AND IMPLEMENTATION: A Julia package is provided at ‘http://github.com/binderh/BoltzmannMachines.jl’.
Keywords:Computational Biology, Leukemic Gene Expression Regulation, Machine Learning, Myeloid Leukemia, Single Nucleotide Polymorphism, Software
Source:Bioinformatics
ISSN:1367-4803
Publisher:Oxford University Press
Volume:33
Number:20
Page Range:3173-3180
Date:October 2017
Official Publication:https://doi.org/10.1093/bioinformatics/btx408
PubMed:View item in PubMed

Repository Staff Only: item control page

Open Access
MDC Library