Application of motif scoring algorithms for enhancer prediction in distantly related species

Dolle, Dirk-Dominik

Vorschau

PDF, Englisch (Dissertation) - Hauptdokument
Download (34MB) | Nutzungsbedingungen

Zitieren von Dokumenten: Bitte verwenden Sie für Zitate nicht die URL in der Adresszeile Ihres Webbrowsers, sondern entweder die angegebene DOI, URN oder die persistente URL, deren langfristige Verfügbarkeit wir garantieren. [mehr ...]

DOI: 10.11588/heidok.00013988
URN: urn:nbn:de:bsz:16-heidok-139888

Abstract

Although many studies proposed methods for the identification of enhancers, reliable prediction on a genome-wide scale is still an unsolved problem. One of the reasons for this is the highly flexible regulatory logic underlying a detectable enhancer activity. In each cell type or tissue and at any given time, a mostly unknown set of transcription factors activates specific regulatory elements by coordinated binding to the corresponding genomic region. Position, spacing, and orientation of the individual bound factors can thereby vary between different enhancers yet result in a highly similar spatio-temporal activity. Due to this inner flexibility, so-called “alignment-free” methods have been proposed for enhancer prediction, as they are able to cope with rearrangements by comparison of word profiles rather than linear sequence. However, the problems caused by allowing for permutation in sequence comparison have not been investigated so far. In this study I implemented several published alignment-free metrics and analysed, which parameters affect their ability to successfully predict regulatory regions. As results show, single point mutations and the increasing amount of spurious matches with decreasing word size pose the biggest challenge to alignment-free techniques, especially when applied on a genome-wide scale. Alignment algorithms usually solve these problems quite efficiently but cannot handle permutation. I therefore implemented a new technique for enhancer prediction that combines the advantages of both algorithm types and used it for the identification of regulatory regions in the teleost fish Oryzias latipes (Medaka) based on a set of known and validated human enhancers. Predicted medaka regions and human enhancers were subsequently used in an in vivo enhancer assay and analysed for their activity. In total, 12 predicted regions corresponding to 9 human enhancers showed clear enhancing activity in the fish. This shows that the principle implemented here is able to predict active enhancers at a high rate on a genome-wide scale even in species as diverged as human and fish. Furthermore, evidence for motif-level conservation between some of the human and medaka enhancers could be found that was invisible for most of the alignment-algorithms used for comparison.

Dokumententyp:	Dissertation
Erstgutachter:	Wittbrodt, Prof. Dr. Jochen
Tag der Prüfung:	9 November 2012
Erstellungsdatum:	19 Nov. 2012 07:55
Erscheinungsjahr:	13 November 2012
Institute/Einrichtungen:	Zentrale und Sonstige Einrichtungen > Centre for Organismal Studies Heidelberg (COS)
DDC-Sachgruppe:	004 Informatik 500 Naturwissenschaften und Mathematik 570 Biowissenschaften, Biologie 590 Tiere (Zoologie)
Freie Schlagwörter:	enhancer, prediction, gene regulation