title: Analysis of whole-genome sequencing data from ICGC-PanCancer project creator: Hong, Chen description: Cancer is one of the greatest health challenges of the 21st century and one of the deadliest diseases in the world. It is a group of different diseases which are caused by abnormal cell growth. In the human body, cell division and apoptosis are well regulated under normal circumstances so that the number of cells is in a dynamic balance. However, normal cells could transform into tumor cells because of genetic mutations. The tumorigenesis can happen in almost any cell of the human body. One of the central tools to address cancer is the profiling of cancer cell genomes and transcriptomes by next generation sequencing (NGS) and subsequent analysis by computational methods. The Pan-Cancer Analysis of Whole Genomes (PCAWG) project is the core project of the International Cancer Genome Consortium. This project provides massive amounts of cancer biological data for analysis. Include more than 2900 patients and 48 types of cancer samples. As part of this intensive effort, I have conducted a very detailed analysis on the molecular mechanisms of cancers. In particular, I conducted a comprehensive study of the relationship between genomic mutations and cancer development. These series of studies include the exploration of cancer driver genes, analysis of telomere maintenance mechanisms and data visualization at the cohort level. First, I explored potential cancer genes by performing statistical analysis of genomic point mutations, insertions and deletions, copy number variations and structural variations. Further, I analyzed the distribution of point mutations and structure variations in cancer genomes. Based on Knudson's two-hit hypothesis, I integrated point mutation and copy number variation information to construct a biallelic inactivation map of the cancer genome. With the biallelic inactivation information, I analyzed potential cancer drivers and applied this finding to synthetic lethality assays associated with cancer driver genes to uncover novel genetic targets that could be used to treat cancer patients with certain driver gene defects. In addition, I designed and improved the CaSINo model to score the relative mutation frequency of chromosomal sequences to screen for potential cancer driver mutations, which can be used not only in coding genes but also in non-coding regions. Moreover, I analyzed point mutations on promoters, trying to find those mutation sites that play a key role in the up-regulation of gene expression. Finally, I designed and improved a scoring method for copy number variation focality to explore the association of focal copy number variation with cancer driver genes at the cohort level. Second, as part of the PCAWG research projects, I analyzed the mechanisms of telomere maintenance in cancer cells. After analyzing the differences between alternative telomere lengthening and telomerase-positive samples, I designed a machine learning model based on repeat sequences, content, and mutation rate to determine whether an unknown cancer sample is an alternative lengthening of telomere (ALT) or telomerase-positive. Finally, for the massive data of the PCAWG project, I designed and implemented two bioinformatics visualization tools. TumorPrint is software in R and shell, which can be used to visualize genomic mutations and RNA-seq expression levels of a single gene or gene pairs, allowing users to quickly search for genes or gene pairs of interest. GenomeTornadoPlot is a software written in the R language for visualizing focal copy number variants of a single gene or adjacent paired genes, and can automatically calculate its copy number variation aggregation score. date: 2023 type: Dissertation type: info:eu-repo/semantics/doctoralThesis type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/33775/1/PhD_Thesis_september_2023.pdf identifier: DOI:10.11588/heidok.00033775 identifier: urn:nbn:de:bsz:16-heidok-337755 identifier: Hong, Chen (2023) Analysis of whole-genome sequencing data from ICGC-PanCancer project. [Dissertation] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/33775/ rights: info:eu-repo/semantics/openAccess rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html language: eng