Search
Browse
Statistics
Feeds

Tissue-supervised VAE: training compendium, model weights, and embeddings (118K-sample bulk RNA-seq)

Item Type:Dataset
Title:Tissue-supervised VAE: training compendium, model weights, and embeddings (118K-sample bulk RNA-seq)
Creators: Pande, Amit ORCID logoORCID: https://orcid.org/0009-0008-0826-973X, Uyar, Bora ORCID logoORCID: https://orcid.org/0000-0002-3170-4890 and Akalin, Altuna ORCID logoORCID: https://orcid.org/0000-0002-0468-0117
Abstract:Data and trained model weights accompanying the manuscript "Tissue-supervised latent representations from a curated 118K-sample multi-source bulk RNA-seq compendium" (Pande, Uyar, Akalin; MDC Berlin / BIMSB). This deposit contains the training compendium, trained VAE weights, and pre-computed embeddings: • processed_scaled_411k_tissue_B_h5.tar.gz (~9 GB unpacked) — HDF5 training compendium: 118,263 train / 28,274 test samples, 42 UBERON tissues, 16,115 genes. • results_denoising_vae_411k_B.tar.gz (~18 GB unpacked) — Standard and Denoising VAE weights, checkpoints, results.json, target_v3_results.json. • vae_tissue.final_model.pth (~7.5 GB) — Trained model weights used by the demo application. • embeddings_train.csv, embeddings_test.csv — Pre-computed 121-dimensional latent representations. • ref_emb_bf93M.npy, tgt_emb_bf93M.npy — BulkFormer-93M embeddings of the reference and TARGET sets. Analysis code and figure-generation scripts are available at https://github.com/BIMSBbioinfo/flexynesis_tissue_vae_manuscript. Unpack the archives and follow the reproduction steps in the repository README.
Source:Zenodo
Publisher:CERN
Date:12 June 2026
Official Publication:https://doi.org/10.5281/zenodo.20661013
Related to:

Repository Staff Only: item control page

Open Access
MDC Library