Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

Metadata-guided feature disentanglement for functional genomics

[thumbnail of Original Article]
Preview
PDF (Original Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB
[thumbnail of Supplementary Data]
Preview
PDF (Supplementary Data) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Item Type:Article
Title:Metadata-guided feature disentanglement for functional genomics
Creators Name:Rakowski, A., Monti, R., Huryn, V., Lemanczyk, M., Ohler, U. and Lippert, C.
Abstract:With the development of high-throughput technologies, genomics datasets rapidly grow in size, including functional genomics data. This has allowed the training of large Deep Learning (DL) models to predict epigenetic readouts, such as protein binding or histone modifications, from genome sequences. However, large dataset sizes come at a price of data consistency, often aggregating results from a large number of studies, conducted under varying experimental conditions. While data from large-scale consortia are useful as they allow studying the effects of different biological conditions, they can also contain unwanted biases from confounding experimental factors. Here, we introduce Metadata-guided Feature Disentanglement (MFD)-an approach that allows disentangling biologically relevant features from potential technical biases. MFD incorporates target metadata into model training, by conditioning weights of the model output layer on different experimental factors. It then separates the factors into disjoint groups and enforces independence of the corresponding feature subspaces with an adversarially learned penalty. We show that the metadata-driven disentanglement approach allows for better model introspection, by connecting latent features to experimental factors, without compromising, or even improving performance in downstream tasks, such as enhancer prediction, or genetic variant discovery. The code will be made available at https://github.com/HealthML/MFD.
Keywords:Deep Learning, Genomics, Metadata
Source:Bioinformatics
ISSN:1367-4803
Publisher:Oxford University Press
Volume:40
Number:Suppl 2
Page Range:ii4-ii10
Date:September 2024
Official Publication:https://doi.org/10.1093/bioinformatics/btae403
PubMed:View item in PubMed

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library