eprintid: 17752 rev_number: 12 eprint_status: archive userid: 1513 dir: disk0/00/01/77/52 datestamp: 2014-12-11 07:56:51 lastmod: 2015-01-20 09:52:18 status_changed: 2014-12-11 07:56:51 type: doctoralThesis metadata_visibility: show creators_name: Pyl, Paul-Theodor title: Method development for comparative cancer genomics divisions: i-140001 divisions: i-850800 adv_faculty: af-14 abstract: With the cost of sequencing continuously dropping and the increased availability of se- quencing technologies the coming years will bring a wealth of sequencing data that will be tremendously interesting and challenging to analyse and interpret. It is becoming more and more clear that both the algorithmic approaches as well as the handling of the data itself will prove challenging and some of the legacy approaches and file formats that were developed in the wake of the first large-scale sequencing projects (e.g. the human genome sequencing project) will prove unsuitable or at least inconvenient to use once projects that aim to analyse sequencing data from thousands of samples become the norm rather than the exception. In this thesis I will discuss my work in the field of sequencing analysis during my PhD studies. I will touch upon some of the challenges and problems researchers are facing in the field today and will present the approaches and solutions I have developed to deal with those issues. To this end I will present my work in methods development for sequencing analy- sis in general and specifically in the context of cancer genomics and give examples of the application of those methods in projects that I have been involved in. My two main contribu- tions to available methods for sequencing analysis are the HTSeq Python Library and the h5vc R/Bioconductor package (Anders et al., 2014; Pyl et al., 2014). I co-developed the former with Simon Anders and am the lead developer of the latter. Both pieces of software are available through public repositories, and are well-documented and -maintained. The projects in which those methods have found application are the HeLa Kyoto sequenc- ing project (Landry et al., 2013) and a set of three cancer genomics projects involving cohorts of up to 18 whole genome sequencing (WGS) samples and up to 21 whole exome sequencing (WES) samples, respectively. I will discuss my methodological contributions to these projects as well as relevant biological results in Section 5. date: 2014 id_scheme: DOI id_number: 10.11588/heidok.00017752 ppn_swb: 816146462 own_urn: urn:nbn:de:bsz:16-heidok-177528 date_accepted: 2014-11-26 advisor: HASH(0x561a629133d8) language: eng bibsort: PYLPAULTHEMETHODDEVE2014 full_text_status: public citation: Pyl, Paul-Theodor (2014) Method development for comparative cancer genomics. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/17752/1/PhD.Thesis.Paul.Thedor.Pyl.pdf