Direkt zum Inhalt
  1. Publizieren |
  2. Suche |
  3. Browsen |
  4. Neuzugänge rss |
  5. Open Access |
  6. Rechtsfragen |
  7. EnglishCookie löschen - von nun an wird die Spracheinstellung Ihres Browsers verwendet.

Integration and visualization of scientific big data to aid systems biology research

Binder, Janos

[thumbnail of janos_binder_thesis.pdf]
Vorschau
PDF, Englisch - Hauptdokument
Download (2MB) | Lizenz: Creative Commons LizenzvertragIntegration and visualization of scientific big data to aid systems biology research von Binder, Janos steht unter einer Creative Commons Namensnennung 3.0 Deutschland

[thumbnail of Erratum]
Vorschau
PDF, Englisch (Erratum)
Download (226kB) | Nutzungsbedingungen

Zitieren von Dokumenten: Bitte verwenden Sie für Zitate nicht die URL in der Adresszeile Ihres Webbrowsers, sondern entweder die angegebene DOI, URN oder die persistente URL, deren langfristige Verfügbarkeit wir garantieren. [mehr ...]

Abstract

Information on protein subcellular and tissue localization is important to understand the cellular functions of proteins. However getting such information is not trivial; one needs to consult model organisms database, to evaluate the results of high-throughput experiments, to read the ever-increasing literature and to use prediction tools, when no previous knowledge on localization is available. Collecting and integrating the necessary information is tedious and difficult to do, and there is a clear need for evidence integration efforts. In my thesis I explored a new way of integrating and presenting localization evidence for the scientific community.

First I discuss the COMPARTMENTS resource, which I developed in collaboration to provide a comprehensive view on localization of proteins. This resource integrates the above-mentioned sources and maps the evidence to common protein and localization identifiers. In addition we developed a text-mining pipeline to find localization-protein associations from the scientific literature. To facilitate comparison of the different types and sources of evidence, we assigned a confidence scoring system to the localization evidence. To provide a simple overview we visualize the evidence on a schematic of a cell. Finally we link the evidence to its source to provide more details to the users.

Large-scale analysis using the COMPARTMENTS resource is also possible with the bulk download files. I have illustrated its usefulness by identifying pairs of compartments that share a statistically significant number of human proteins and by showing that protein-protein interaction networks can be used to infer protein localization of interacting partners.

Later I present the TISSUES resource, which integrates evidence on tissue expression. The resource presents the evidence the same way as COMPARTMENTS, however it integrates more high-throughput experimental datasets. My contribution was to create reusable components; I created a simple graphical overview based on the type and the confidence score of the evidence. I have also improved the text-mining of human tissues by filtering the underlying localization keywords.

Finally I study integration on identifier level through the example of disease databases. Ontologies are useful in data integration, however not all of them provide the same quality. Therefore we created a modified version of the text mining pipeline to map entries from the Online Mendelian Inheritance in Man (OMIM) to the Disease Ontology (DO). Moreover we built a collaboration with the team behind the ontology and they use these mappings as a basis for the next version. Overall this thesis provides novel solutions for integrating biological data at different levels.

Dokumententyp: Dissertation
Erstgutachter: Kummer, Prof. Dr. Ursula
Tag der Prüfung: 30 September 2014
Erstellungsdatum: 16 Dez. 2014 14:07
Erscheinungsjahr: 2014
Institute/Einrichtungen: Fakultät für Biowissenschaften > Dekanat der Fakultät für Biowissenschaften
Zentrale und Sonstige Einrichtungen > Europäisches Laboratorium für Molekularbiologie (EMBL)
DDC-Sachgruppe: 000 Allgemeines, Wissenschaft, Informatik
004 Informatik
570 Biowissenschaften, Biologie
Leitlinien | Häufige Fragen | Kontakt | Impressum |
OA-LogoDINI-Zertifikat 2013Logo der Open-Archives-Initiative