Integrative Analysis of Omics Datasets

Toprak, Umut

PDF, English
Download (99MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00027429
URN: urn:nbn:de:bsz:16-heidok-274291
URL: http://www.ub.uni-heidelberg.de/archiv/27429

Abstract

Cancer is a disease of aberrant cell proliferation and tumour growth arising from the perturbation of the epigenetically defined, regulated and maintained cell identity by genetic mutations. It is a leading cause of death worldwide and most cancer types remain incurable. Omics technologies are quantitative analytical assays that allow high-quality and high-throughput measurements of different aspects of cellular regulation including genomics, transcriptomics, epigenomics, proteomics and metabolomics. These high-throughput technologies transformed the way cancer research is done, leading to tremendous advances in our understanding of cancer biology and modern targeted therapies.

Integrative analysis of multi-omics datasets in cancer research requires use of dedicated algorithms, data analysis and visualization tools. These are developed and applied in interdisciplinary teams of scientists and clinicians working on collaborative projects. Both the technical complexities of data analysis and their integration, and the efficient independent exploration of the observations by all project partners are contemporary research challenges. This dissertation presents results addressing a broad spectrum of these questions.

Chapter 1, Replacing the CNS-PNET Superentity with Four Novel Molecularly Defined Entities Driven by Structural Variants: Central nervous system primitive neuroectodermal tumours (CNS-PNETs) were a heterogeneous family of paediatric brain tumours with no histopathological markers, challenging diagnosis and poor prognosis. My work as a computational biologist contributed to the comprehensive description of this entity. In this study, we applied an integrative omics data analysis of methylomes, transcriptomes and genomes revealing that CNS-PNETs are a combination of a large group of misdiagnosed cases from other entities and four novel molecularly defined entities. I showed that these novel entities are driven by distinct and recurrent molecular drivers altered by different mechanisms of structural variants: the FOXR2 oncogene and MN1, CIC and BCOR tumour suppressors. Our results contributed to the elimination of CNS-PNETs as an officially recognized cancer entity and the recognition of four novel paediatric brain tumour entities in the World Health Organization classification of brain tumours.

Chapter 2, SOPHIA, Structural Rearrangement Detection Based on Supplementary Alignments and a Population Background Model: Building on my work on structural variation in our study of CNS-PNETs, I developed the SOPHIA algorithm for detecting SVs in cancer genomes based on a large population background database and a corresponding bioinformatics tool written allowing fast detection of SVs from short read cancer genome sequencing datasets. SOPHIA later became the standard tool for structural variant detection in the DKFZ’s cancer genome analysis workflow.

Chapter 3, EPISTEME, an Interactive and Integrative Platform for Analysing, Interpreting and Sharing Multi-Omics Data: During the development of SOPHIA and my research in projects analysing and interpreting structural variant data, I developed experiences analysing structural variant data detected by SOPHIA, integrating them with different omics layers such as gene expressions, interpreting, visualizing and sharing them with collaborators who were not computational scientists. Based on these experiences and using modern tools of interactive data visualization, I developed an interactive platform for integrative omics data analysis and visualization named EPISTEME, with the aim of facilitating omics data analysis by scientists with conceptual knowledge of cancer omics but no programming skills. EPISTEME is a comprehensive tool integrating genome, transcriptome, methylome and proteome data with clinical metadata in a user-friendly web-based system with in-browser statistical analyses and publication-quality vector graphics output.

Chapter 4, SOPHIA-EPISTEME integration in DKFZ Cancer Genomics Projects Reveals Novel Disease Subtypes and Insights Across Cancer Types: With the integration of SOPHIA and EPISTEME in an integrative omics data analysis setting, my work identified novel oncogenes activated by enhancer hijacking and revealed novel molecularly defined subtypes in refractory multiple myeloma (MYCN enhancer hijacking via immunoglobulin rearrangements as a MYC replacement), adult acute myeloid leukaemia (MNX1 activation via enhancer hijacking putatively acting as a differentiation block mechanism) and paediatric neuroblastoma (ATOH1 activation via enhancer hijacking putatively acting as a MYCN replacement) in projects supported by the DKFZ Heidelberg Center for Personalized Oncology (DKFZ-HIPO) and the German Society for Paediatric Oncology and Haematology (GPOH) cancer research programmes.

Document type:	Dissertation
Supervisor:	Brors, Prof. Dr. Benedikt
Place of Publication:	Heidelberg
Date of thesis defense:	18 October 2019
Date Deposited:	04 Dec 2019 11:59
Date:	2019
Faculties / Institutes:	The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
DDC-classification:	004 Data processing Computer science 570 Life sciences 610 Medical sciences Medicine