%0 Generic %A Jendrusch, Michael Adrian %C Heidelberg %D 2025 %F heidok:36676 %K Generative Modelle, Proteindesign %R 10.11588/heidok.00036676 %T Machine Learning and Optimization for de novo Protein Design %U https://archiv.ub.uni-heidelberg.de/volltextserver/36676/ %X Machine learning-based approaches to protein design have been successfully applied to a variety of design tasks, ranging from unconditional de novo design to the design of protein binders, enzymes and large protein assemblies. Two main approaches have been used to achieve this: hallucination-based methods that invert protein structure predictors to generate protein structures and diffusion models, which iteratively generate protein structures from random noise. While these methods are highly successful, they suffer from a three-way trade-off between the designability of generated structures, the flexibility of the method for tackling different protein design tasks and the speed of the method. Hallucination methods are slow but achieve high designability and flexibility, while diffusion models trade flexibility for speed and designability. In this thesis, I present two approaches to protein design in the hopes of addressing this three-way trade-off. In the first part, I introduce AlphaDesign, a hallucination-based method. AlphaDesign generates monomers, oligomers and protein binders with high success-rates. To demonstrate its real-world utility, I apply it to design inhibitors of bacterial toxin RcaT-Sen2, which show in vivo inhibition activity. AlphaDesign produces protein designs with high computational success-rates and results in in vivo active inhibitors to a challenging target protein. However, like other hallucination-based methods, it suffers from long runtimes and undesirable O(N³) scaling with the number of amino acids designed. In the second part of thesis, I developed salad, a family of protein diffusion models with O(N) runtime complexity in the hopes of addressing the issue of efficiency. salad outperforms previous protein diffusion models, both in terms of speed and designability. To overcome the lack of flexibility in protein diffusion models, I combine salad with structure-editing, a modified generative process for protein diffusion models. This allows salad to solve various protein design tasks without a need for additional model training. Combined with structure-editing, salad is the first protein diffusion model to de novo design conformation-changing proteins as well as superhelical repeat proteins. This way, salad+structure-editing provides a versatile toolbox for computational protein design, simultaneously addressing the three-way trade-off of speed, designability and flexibility.