Exploiting emerging DNA sequencing technologies to study genomic rearrangements

Meiers, Sascha

[thumbnail of thesis PRINTED VERSION.pdf]

Preview

PDF, English - main document
Download (16MB) | Lizenz:

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00024506
URN: urn:nbn:de:bsz:16-heidok-245063
URL: http://www.ub.uni-heidelberg.de/archiv/24506

Abstract

Structural variants (SVs) alter the structure of chromosomes by deleting, duplicating or otherwise rearranging pieces of DNA. They contribute the majority of nucleotide differences between humans and are known to play causal roles in many diseases. Since the advance of massively parallel sequencing (MPS) technologies, SVs have been studied more comprehensively than ever before. However, in contrast to smaller types of genetic variation, SV detection is still fundamentally hampered by the limitations of short-read sequencing that cannot sufficiently cope with the complexity of large genomes. Emerging DNA sequencing technologies and protocols hold the potential to overcome some of these limitations. In this dissertation, I present three distinct studies each utilizing such emerging techniques to detect, to validate and/or to characterize SVs. These technologies, together with novel computational approaches that I developed, allow to characterize SVs that had previously been challenging, or even impossible, to assess.

First, inversions - a class of SVs that is notoriously difficult to ascertain - were studied in the context of the 1000 Genomes Project. These inversions had previously been predicted from low-coverage short read sequencing data, but remained inconclusive in classical PCR validation experiments. Using sequencing data from modern long-read technologies (Pacific BioSciences and Oxford Nanopore Technologies), I was able to validate hundreds of them. I then developed a computational tool to visualize long-read data, and discovered that the majority of loci harbored complex SVs rather than simple inversions. These findings suggest that the amount of complex structural variation in the human genome had so far been under-appreciated, owing to limitations in their detection using standard techniques.

In the second part, I explored the functional impact of large SVs on gene expression and chromatin organization. Previously, a series of studies described drastic effects of SVs on the regulation of genes via mechanisms that alter the three-dimensional conformation of DNA. However, these studies had focused on pathological phenotypes and on few, selected genes. We hence set out to study gene expression and chromatin conformation in highly rearranged chromosomes of Drosophila melanogaster without a pathological phenotype. I first utilized Hi-C, which we applied in order to measure chromatin conformation, to characterize the rearrangements present in these chromosomes. Then, despite the presence of 15 breakpoints, we found no evidence for a conformation-related mechanism acting on gene regulation. This is particularly surprising as the majority of these breakpoints disrupted topologically associating domains. This study hence sheds a new light on the role of chromatin conformation that is complementary to the findings of previous studies. In addition, it demonstrates the capabilities of the Hi-C technology to reveal structural variation.

Third, I present the current state of a collaborative effort to enable SV detection in single cells. Studies of somatic mosaicism, i.e. on the genetic heterogeneity among cells, have so far been severely limited in the ability to discover SVs, especially copy-neutral and complex rearrangements. We hence conceived a novel method to infer - for the first time - at least seven different SV classes in single cells. This approach utilizes three independent signals that are identifiable in single-cell stranded template sequencing (Strand-seq) data. I here present a computational method called MosaiCatcher to realize this idea and provide examples that demonstrate its feasibility. In order to explore the limitations of this method, I designed a versatile framework for the simulation of Strand-seq data and used it to assess the performance of one of the central steps of MosaiCatcher. Once completed, this novel method will facilitate studies of SV heterogeneity and mosaicism in the context of cancer and ageing.

In summary, I utilized emerging technologies to discover SVs - notably copy-neutral and complex rearrangements - that so far eluded detection based on MPS. This led to novel insights on the complexity and functional impact of these SVs. Moreover, I developed computational tools that advance our capabilities for SV detection and characterization, and that might aid future studies to gain a deeper understanding of the role of SVs in health and disease.

Document type:	Dissertation
Supervisor:	Zaugg, Dr. Judith
Date of thesis defense:	14 May 2018
Date Deposited:	28 May 2018 07:03
Date:	2018
Faculties / Institutes:	The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences Service facilities > European Molecular Biology Laboratory (EMBL)
DDC-classification:	570 Life sciences
Controlled Keywords:	genetics, bioinformatics, DNA, sequencing
Uncontrolled Keywords:	structural variation, DNA sequencing, bioinformatics