title: Machine Learning and Optimization for de novo Protein Design creator: Jendrusch, Michael Adrian subject: ddc-004 subject: 004 Data processing Computer science subject: ddc-500 subject: 500 Natural sciences and mathematics subject: ddc-570 subject: 570 Life sciences description: Machine learning-based approaches to protein design have been successfully applied to a variety of design tasks, ranging from unconditional de novo design to the design of protein binders, enzymes and large protein assemblies. Two main approaches have been used to achieve this: hallucination-based methods that invert protein structure predictors to generate protein structures and diffusion models, which iteratively generate protein structures from random noise. While these methods are highly successful, they suffer from a three-way trade-off between the designability of generated structures, the flexibility of the method for tackling different protein design tasks and the speed of the method. Hallucination methods are slow but achieve high designability and flexibility, while diffusion models trade flexibility for speed and designability. In this thesis, I present two approaches to protein design in the hopes of addressing this three-way trade-off. In the first part, I introduce AlphaDesign, a hallucination-based method. AlphaDesign generates monomers, oligomers and protein binders with high success-rates. To demonstrate its real-world utility, I apply it to design inhibitors of bacterial toxin RcaT-Sen2, which show in vivo inhibition activity. AlphaDesign produces protein designs with high computational success-rates and results in in vivo active inhibitors to a challenging target protein. However, like other hallucination-based methods, it suffers from long runtimes and undesirable O(N³) scaling with the number of amino acids designed. In the second part of thesis, I developed salad, a family of protein diffusion models with O(N) runtime complexity in the hopes of addressing the issue of efficiency. salad outperforms previous protein diffusion models, both in terms of speed and designability. To overcome the lack of flexibility in protein diffusion models, I combine salad with structure-editing, a modified generative process for protein diffusion models. This allows salad to solve various protein design tasks without a need for additional model training. Combined with structure-editing, salad is the first protein diffusion model to de novo design conformation-changing proteins as well as superhelical repeat proteins. This way, salad+structure-editing provides a versatile toolbox for computational protein design, simultaneously addressing the three-way trade-off of speed, designability and flexibility. date: 2025 type: Dissertation type: info:eu-repo/semantics/doctoralThesis type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserver/36676/1/Thesis_PDFA_MAJendrusch.pdf identifier: DOI:10.11588/heidok.00036676 identifier: urn:nbn:de:bsz:16-heidok-366763 identifier: Jendrusch, Michael Adrian (2025) Machine Learning and Optimization for de novo Protein Design. [Dissertation] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/36676/ rights: info:eu-repo/semantics/openAccess rights: Please see front page of the work (Sorry, Dublin Core plugin does not recognise license id) language: eng