Helmholtz Gemeinschaft


Separating sparse signals from correlated noise in binary classification

Item Type:Conference or Workshop Item
Title:Separating sparse signals from correlated noise in binary classification
Creators Name:Mandt, S., Wenzel, F., Nakajima, S., Lippert, C. and Kloft, M.
Abstract:Among the goals of statistical genetics is to find sparse associations of genetic data with binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ancestry, or population structure. A widely appreciated modeling paradigm which corrects for such confounding relies on linear mixed models. These are linear regression models with correlated noise, where the noise covariance captures similarities between the samples. We generalize this modeling paradigm to binary classification. We thereby face the technical challenge that that marginalizing over the noise leads to an intractable, high-dimensional integral. We propose a variational EM algorithm to overcome this problem, where the global model parameters are `1-norm regularized, leading to a sparse solution. The selected features are much less affected by the spurious correlations in the data, manifested by a smaller correlation between the features and the first principal component of the noise covariance. The proposed method also outperforms Gaussian process classification and uncorrelated probit regression in terms of prediction performance. In addition, we discuss ongoing work on employing stochastic gradient MCMC for this problem class.
Source:CEUR Workshop Proceedings
Page Range:48-58

Repository Staff Only: item control page

Open Access
MDC Library