Automated modelling of multimeric protein complexes from heterogeneous structures

Davis, Chad

German Title: Automatisierte Modellierung multimerer Proteinkomplexe mit heterogenen Strukturen

Preview

PDF, English
Download (17MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00011363
URN: urn:nbn:de:bsz:16-opus-113632
URL: http://www.ub.uni-heidelberg.de/archiv/11363

Abstract

Protein interaction networks provide an increasingly complex picture of the relationships between macromolecules in the cell. Complementing these interactions with structural data provides critical insights into interaction mechanisms. However, structural information is available only for a tiny fraction of protein interactions and complexes currently known. To address this gap, we have developed a method to predict macromolecular complex structures by systematic combination of pairwise interactions of known structure. We first identify all interactions within a network that are of known structure or sufficiently similar to known structure to permit homology modelling. We then use these structural constraints to construct models of complexes. We tackle combinatorial explosion by developing an efficient algorithm that exploits heuristics to reduce the large search space and complement this with an automated scoring system to filter out the exponentially large number of unrealistic complexes, leaving a ranked set of the most plausible models. To test the approach, we defined a benchmark set of complexes of known structure, and show that many complexes can be re-created with good accuracy, using templates below 75% sequence identity. Certain models are much larger and more complete than what is capable with traditional modelling techniques. The approach can identify the most plausible homology models for a complex of dozens of proteins in less than a few hours. We applied the approach to whole-proteome sets of complexes from S. cerevisiae. For the complexes of known structure, we are able to identify the native complex in the majority of cases. We provide promising models for several dozen additional complexes, including multiple isoforms for each. Modelled complexes also provide functional classification, particularly for unannotated complexes from structural genomics initiatives. We show that the best results are achieved when the stoichiometry of the components is known and when the modelling is approached hierarchically, where core components, representing high-confidence interactions, are modelled before non-obligate interactions. We are refining this aspect of the automated modelling and making the procedure publicly available via a web service, to aid in the analysis of models. As the rate of structurally resolved interactions grows, our ability to model larger and more diverse complexes will grow exponentially.

Translation of abstract (German)

Interaktionsnetzwerke bieten ein zunehmend komplexes Bild der Beziehungen zwischen Makromolekülen in der Zelle. Proteinstrukturen ergänzen diese Netzwerke und ermöglichen wichtige Einblicke in die Mechanismen dieser Wechselwirkungen. Allerdings deckt der aktuelle Bestand an strukturellen Informationen nur einen Bruchteil aller Interaktionen und Komplexe ab. Um diese Kluft zu überbrücken, haben wir eine Methode entwickelt, die durch systematische Kombination von Interaktionen bekannter Strukturen makromolekulare Komplexe vorhersagt. Zuerst ermitteln wir alle Interaktionen innerhalb eines Netzwerks, die aus bekannten Strukturen bestehen, oder ähnlich genug sind, um eine Homologiemodellierung zu ermöglichen. Mit den von diesen Strukturen gesetzten räumlichen Einschränkungen bauen wir Modelle eines Komplexes. Um die kombinatorische Explosion zu minimieren, haben wir einen effizienten Algorithmus entwickelt, der Heuristiken benutzt, um den großen Suchraum gezielt zu reduzieren. Wir ergänzen diesen mit einem automatisierten Bewertungssystem, um die exponentiell große Anzahl von unrealistischen Komplexen zu filtern, und ein Ranking der plausibelsten Modelle aufzustellen. Um den Ansatz zu evaluieren, haben wir die Methode auf eine Reihe von Komplexen bekannter Struktur angewandt. Viele Komplexe konnten mit hoher Genauigkeit modelliert werden, auch von Homologen, die weniger als 75% Sequenzidenität aufweisen. Bestimmte Modelle sind viel größer und vollständiger als das, was durch Standardverfahren als modellierbar gilt. Es können die vielversprechendsten Homologiemodelle für einen Komplex von Dutzenden von Proteinen in weniger als ein paar Stunden hergestellt werden. Das System haben wir auf das ganze Proteom von S. cerevisae angewandt. Für die Komplexe bekannter Struktur sind wir in der Lage, in den meisten Fällen die eigentliche Struktur zu identifizieren. Wir bieten auch plausible Modelle für mehrere Dutzende zusätzliche Komplexe, jeweils mit mehreren Isoformen. Manche Modelle haben auch zur funktionellen Klassifikation beigetragen, insbesondere bei unbekannten Komplexen aus der Struktur-Genomik. Wir zeigen, dass die besten Ergebnisse erzielt werden, wenn die Stöchiometrie der Komponenten bekannt ist und wenn die Modellierung hierarchisch ist, wobei die stabilsten Kern-Komponente zuerst verarbeitet werden, bevor Interaktionen niedriger Verlässlichkeit in Betracht gezogen werden. Wir erweitern diese Strategie und machen das System öffentlich zugänglich über einen Web-Service, der die Analyse von Modellen erleichtert. Solange die Anzahl der Interaktionstrukturen wächst, wird unsere Fähigkeit, größere und vielfältigere Komplexe zu modellieren exponentiell wachsen.

Document type:	Dissertation
Supervisor:	Steinmetz, Dr. Lars
Date of thesis defense:	30 November 2010
Date Deposited:	15 Dec 2010 12:54
Date:	2010
Faculties / Institutes:	The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
DDC-classification:	570 Life sciences
Uncontrolled Keywords:	Proteinstruktur , Strukturbioinformatik , ProteinkomplexeHomology modelling , protein complexes , structural bioinformatics