Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Integration and visualization of scientific big data to aid systems biology research

Binder, Janos

PDF, English - main document
Download (2MB) | Lizenz: Creative Commons LizenzvertragIntegration and visualization of scientific big data to aid systems biology research by Binder, Janos underlies the terms of Creative Commons Attribution 3.0 Germany

PDF, English (Erratum)
Download (226kB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.


Information on protein subcellular and tissue localization is important to understand the cellular functions of proteins. However getting such information is not trivial; one needs to consult model organisms database, to evaluate the results of high-throughput experiments, to read the ever-increasing literature and to use prediction tools, when no previous knowledge on localization is available. Collecting and integrating the necessary information is tedious and difficult to do, and there is a clear need for evidence integration efforts. In my thesis I explored a new way of integrating and presenting localization evidence for the scientific community.

First I discuss the COMPARTMENTS resource, which I developed in collaboration to provide a comprehensive view on localization of proteins. This resource integrates the above-mentioned sources and maps the evidence to common protein and localization identifiers. In addition we developed a text-mining pipeline to find localization-protein associations from the scientific literature. To facilitate comparison of the different types and sources of evidence, we assigned a confidence scoring system to the localization evidence. To provide a simple overview we visualize the evidence on a schematic of a cell. Finally we link the evidence to its source to provide more details to the users.

Large-scale analysis using the COMPARTMENTS resource is also possible with the bulk download files. I have illustrated its usefulness by identifying pairs of compartments that share a statistically significant number of human proteins and by showing that protein-protein interaction networks can be used to infer protein localization of interacting partners.

Later I present the TISSUES resource, which integrates evidence on tissue expression. The resource presents the evidence the same way as COMPARTMENTS, however it integrates more high-throughput experimental datasets. My contribution was to create reusable components; I created a simple graphical overview based on the type and the confidence score of the evidence. I have also improved the text-mining of human tissues by filtering the underlying localization keywords.

Finally I study integration on identifier level through the example of disease databases. Ontologies are useful in data integration, however not all of them provide the same quality. Therefore we created a modified version of the text mining pipeline to map entries from the Online Mendelian Inheritance in Man (OMIM) to the Disease Ontology (DO). Moreover we built a collaboration with the team behind the ontology and they use these mappings as a basis for the next version. Overall this thesis provides novel solutions for integrating biological data at different levels.

Item Type: Dissertation
Supervisor: Kummer, Prof. Dr. Ursula
Date of thesis defense: 30 September 2014
Date Deposited: 16 Dec 2014 14:07
Date: 2014
Faculties / Institutes: The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
Service facilities > European Molecular Biology Laboratory (EMBL)
Subjects: 000 Generalities, Science
004 Data processing Computer science
570 Life sciences
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative