eprintid: 30629 rev_number: 19 eprint_status: archive userid: 5988 dir: disk0/00/03/06/29 datestamp: 2021-10-21 09:38:53 lastmod: 2021-11-19 10:40:48 status_changed: 2021-10-21 09:38:53 type: doctoralThesis metadata_visibility: show creators_name: Quintero Moreno, Andrés Felipe title: Learning the Parts of Omics: Inference of Molecular Signatures with Non-negative Matrix Factorization subjects: ddc-004 subjects: ddc-500 divisions: i-140001 adv_faculty: af-14 cterms_swd: nicht-negative Matrixfaktorisierung cterms_swd: Bioinformatik cterms_swd: Krebs cterms_swd: Neuroblastom cterms_swd: Genomik abstract: Background: Feature extraction and signature identification are two critical steps to understand diverse biological processes. Signatures are defined as groups of molecular features that are sufficient to identify certain genotype or phenotype. In particular, Non-negative Matrix Factorization (NMF) has been used to identify signatures in complex genomic datasets. However, running a basic NMF analysis is a challenging task with a steep learning curve and long computing time; furthermore, the usability of these algorithms is lessened by limited resources to interpret the results obtained from them. This creates a pressing need for the development of tools that mitigate such obstacles. Results: In this study we developed ButchR and ShinyButchR, a fast and user-friendly toolkit to decompose datasets (slicing genomics) and learn signatures using NMF. The package can be freely installed from GitHub at https://github.com/wurst-theke/ButchRr. We used ButchR to identify a new regulatory subtype in neuroblastoma, which showed mesenchymal characteristics and was phenotypically associated to multipotent Schwann cell precursors. Additionally, we created a new workflow to infer regulatory relationships between genes and their _cis_-regulatory elements for individual cells, followed by inference of regulatory-signatures. Conclusions: ButchR/ShinyButchR is an useful toolkit for analyzing multiple types of data, and inferring signatures that are able to capture relevant biological information. This toolkit is a new valuable resource to the scientific community, and it can be used to understand complex biological processes. date: 2021 id_scheme: DOI id_number: 10.11588/heidok.00030629 ppn_swb: 1778016286 own_urn: urn:nbn:de:bsz:16-heidok-306292 date_accepted: 2021-09-21 advisor: HASH(0x55fc36c69fe8) language: eng bibsort: QUINTEROMOLEARNINGTH2021 full_text_status: public place_of_pub: Heidelberg citation: Quintero Moreno, Andrés Felipe (2021) Learning the Parts of Omics: Inference of Molecular Signatures with Non-negative Matrix Factorization. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/30629/1/thesis.pdf