Preview |
PDF, English
- main document
Download (2MB) | Lizenz: ![]() |
Preview |
PDF, English (Erratum)
Download (226kB) | Terms of use |
Abstract
Information on protein subcellular and tissue localization is important to understand the cellular functions of proteins. However getting such information is not trivial; one needs to consult model organisms database, to evaluate the results of high-throughput experiments, to read the ever-increasing literature and to use prediction tools, when no previous knowledge on localization is available. Collecting and integrating the necessary information is tedious and difficult to do, and there is a clear need for evidence integration efforts. In my thesis I explored a new way of integrating and presenting localization evidence for the scientific community.
First I discuss the COMPARTMENTS resource, which I developed in collaboration to provide a comprehensive view on localization of proteins. This resource integrates the above-mentioned sources and maps the evidence to common protein and localization identifiers. In addition we developed a text-mining pipeline to find localization-protein associations from the scientific literature. To facilitate comparison of the different types and sources of evidence, we assigned a confidence scoring system to the localization evidence. To provide a simple overview we visualize the evidence on a schematic of a cell. Finally we link the evidence to its source to provide more details to the users.
Large-scale analysis using the COMPARTMENTS resource is also possible with the bulk download files. I have illustrated its usefulness by identifying pairs of compartments that share a statistically significant number of human proteins and by showing that protein-protein interaction networks can be used to infer protein localization of interacting partners.
Later I present the TISSUES resource, which integrates evidence on tissue expression. The resource presents the evidence the same way as COMPARTMENTS, however it integrates more high-throughput experimental datasets. My contribution was to create reusable components; I created a simple graphical overview based on the type and the confidence score of the evidence. I have also improved the text-mining of human tissues by filtering the underlying localization keywords.
Finally I study integration on identifier level through the example of disease databases. Ontologies are useful in data integration, however not all of them provide the same quality. Therefore we created a modified version of the text mining pipeline to map entries from the Online Mendelian Inheritance in Man (OMIM) to the Disease Ontology (DO). Moreover we built a collaboration with the team behind the ontology and they use these mappings as a basis for the next version. Overall this thesis provides novel solutions for integrating biological data at different levels.
Document type: | Dissertation |
---|---|
Supervisor: | Kummer, Prof. Dr. Ursula |
Date of thesis defense: | 30 September 2014 |
Date Deposited: | 16 Dec 2014 14:07 |
Date: | 2014 |
Faculties / Institutes: | The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences Service facilities > European Molecular Biology Laboratory (EMBL) |
DDC-classification: | 000 Generalities, Science 004 Data processing Computer science 570 Life sciences |