%0 Generic %A Forneris, Mattia %C Heidelberg %D 2020 %F heidok:27324 %R 10.11588/heidok.00027324 %T Natural sequence variation as a tool to dissect gene expression regulation in Drosophila melanogaster %U https://archiv.ub.uni-heidelberg.de/volltextserver/27324/ %X Genetic variation is a major cause of differences between individuals and it represents a powerful tool to study gene regulation. By interfering with cis- Regulatory Modules (CRMs), variants can unravel CRM function. On the other hand, predicting the effect of variants on phenotype by the DNA sequence has proven to be challenging. In this thesis, I use Drosophila embryonic development as a model system to study diversity in gene regulation at the transcriptional level. CRMs can be characterized using multiple genome-wide techniques such as DNase hypersensitivity. However, despite having comprehensive CRM maps, it is still difficult to predict what are the genes regulated by each CRM. Functional methods, such as mutagenesis, are effective but poorly scalable. To address this issue, I developed an eQTL method (called DHS-eQTL) that makes use of naturally occurring genetic variation, to associate CRMs with the genes they regulate. The results reveal 2,967 DHS-eQTLs and indicate a high extent of CRM sharing between genes. We validated the results with in silico and in vitro approaches and I discuss upcoming in vivo experiments. We observed long-range enhancer regulation suggesting that commonly used methods to associate genes and enhancers underestimate their distance. Also, the DHS-eQTLs show that promoter-proximal CRMs have widespread distal activity. The separation between populations causes an increase in genetic differences by drift and adaptation to different environments. We investigated gene expression differences between Drosophila populations from five continents by performing RNA-Seq on 80 inbred fly lines. We performed multiple quality-control tests to ensure that the gene expression dataset is of high quality. Gene expression profiles show detectable diversity among the fly lines from different continents and confirm what has been observed at the genetic level. In particular, the African population is the most separated, while the American, European and Australian ones show less diversity. In addition, we identified 903 gene and 2,021 exon eQTLs. Genetic variants can interfere with Transcription Factor Binding Sites (TFBS) and this might, in turn, lead to changes in chromatin accessibility. We applied LS-GKM (an SVM method that uses gapped k-mers) to learn sequence features of tissue-specific accessible chromatin and predict the impact of natural sequence variation on accessibility. We train LS-GKM on six tissue-specific training sets: neuroectodermal, mesodermal and double negative CRMs divided in promoter-proximal and promoter-distal. The method unbiasedly recovers tissue-specific TFBS and shows good performance despite the small training sets. Finally, we score variants from groups of inbred Drosophila lines. Interestingly, rare variants have a higher impact on accessibility.