Preview |
PDF, English
Download (65MB) | Terms of use |
Abstract
Background: Deep Learning (DL) is becoming more and more state-of-the-art for the analysis of next-generation sequencing data such as RNA-seq and single-cell RNA-seq, due to its ability to capture more complex patterns in the data. In particular, variational autoencoders (VAEs) have been used for a variety of tasks ranging from batch effect removal to data integration. One disadvantage of DL lies in its limited interpretability due to the the non-linear nature of the models. However, interpretability is crucial especially in the biomedical context.
Results: In this thesis, we developed OntoVAE, an interpretable VAE model whose latent space and decoder are reflecting a biological regulatory network. OntoVAE can be installed from Pypi and is available on GitHub at https://github.com/hdsu-bioquant/onto-vae. We used OntoVAE to compute pathway activities and to predict the outcome of a gene knockout and of interferon treatment response. We then further developed COBRA, a tool that extends OntoVAE with an adversarial approach to disentangle the effects of different covariates. We used COBRA to study interferon response, adrenal medulla development, and schizophrenia.
Conclusion: OntoVAE and COBRA are useful VAE tools that are based on an interpretable latent space and decoder. They can compute pathway activities, but also be used for predictive modeling, and in the case of COBRA, also to extract effects otherwise overshadowed by confounders. Both tools are easy to install and easy to use, and thus a valuable resource to the scientific community.
Document type: | Dissertation |
---|---|
Supervisor: | Herrmann, Prof. Dr. Carl |
Place of Publication: | Heidelberg |
Date of thesis defense: | 14 October 2024 |
Date Deposited: | 05 Nov 2024 08:56 |
Date: | 2024 |
Faculties / Institutes: | Fakultät für Ingenieurwissenschaften > Institute of Pharmacy and Molecular Biotechnology |
DDC-classification: | 004 Data processing Computer science 570 Life sciences |
Controlled Keywords: | Bioinformatik, Maschinelles Lernen |