On deep learning for diagnosis and prognostication

Shmatko, Artem

[thumbnail of Thesis_Artem_Shmatko_print_PDF_A.pdf]

Preview

PDF, English
Download (52MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00038541
URN: urn:nbn:de:bsz:16-heidok-385417

Abstract

Disease risk modelling is central to modern medicine as it enables prevention and precision treatment. Cardiovascular risk scores guide statin prescriptions; tumour classifications inform surgical planning and therapy selection; screening programmes stratify populations for early cancer detection. Traditionally, such predictions relied on simple statistical models with a few variables. The growing availability of rich clinical data ranging from electronic health records spanning decades of patient histories to high-resolution histopathological images opens the door to deep learning models that unlock more accurate, personalised predictions. Nevertheless, the development and clinical translation of such models remains challenging due to issues of interpretability, data biases, and limited generalisation to unseen cohorts. Both prognostic models, which estimate disease risks, and diagnostic models, which classify present conditions, must produce calibrated and accurate outputs to be clinically useful.

This thesis presents two novel deep learning methods addressing both prognostic and diagnostic tasks. The first approach, Delphi, is a transformer-based model for multi-disease risk prediction using prior medical history from electronic health records. Trained on UK Biobank data, Delphi treats health trajectories as sequences of medical events and predicts future diseases autoregressively. Grounded in time-to-event analysis, it combines the competing-risks exponential framework with a transformer architecture, maintaining interpretability while capturing complex interactions among diseases. Based on a patient's medical history, Delphi predicts risks across the entire spectrum of human diseases, achieving strong performance on over 1,000 conditions, with well-calibrated predictions that remain accurate up to a decade ahead. As a generative model, it can sample lifelong synthetic health trajectories, which are useful for estimating future disease burden at population scales and for training privacy-preserving models. Using SHAP values for interpretability, Delphi organises diseases into clinically meaningful comorbidity clusters and learns temporal dependencies between them. However, this interpretability framework also reveals biases inherited from training data, such as an immortality bias stemming from cohort recruitment, as well as more subtle effects tied to data source availability.

The second project, Hetairos, addresses the diagnostic task of classifying central nervous system (CNS) tumours from histopathological images. While DNA methylation profiling has become the gold standard for CNS tumour subtyping, it remains inaccessible in many settings due to cost and infrastructure requirements. Hetairos predicts 102 methylation-based subtypes directly from readily available haematoxylin- and eosin-stained tissue section scans using multiple-instance learning. Trained on over 6000 slides from Heidelberg and validated across 10 external institutions spanning four continents, Hetairos achieves 87-88% accuracy for high-confidence predictions. In a prospective validation of 210 cases, the predictions were available within two days of surgery, compared with sixteen days for molecular testing, marking Hetairos as a promising rapid precision diagnostics tool.

Together, these projects illuminate both the promises and the challenges of deep learning in medicine. Both models achieve strong performance, yet both reveal how data acquisition shapes what models learn in potentially undesirable ways. For Delphi, observation bias manifests as learned associations between data source availability and predicted disease rates. For Hetairos, scanner-specific colour profiles and technical batch effects create domain shifts that degrade performance on external cohorts. Bridging the gap between research and clinical practice will require advances in training methodology, data collection practices, and regulatory frameworks for AI-assisted diagnostics. Despite these open problems, the results presented in this thesis demonstrate that deep learning can scale to population-level datasets, capture temporal disease dynamics, and detect patterns inaccessible to conventional workflows, laying the foundations for improved clinical decision-making and a shift towards personalised, preventive medicine.

Document type:	Dissertation
Supervisor:	Stegle, Prof. Dr. Oliver
Place of Publication:	Heidelberg
Date of thesis defense:	23 March 2026
Date Deposited:	28 Apr 2026 11:00
Date:	2026
Faculties / Institutes:	The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
DDC-classification:	004 Data processing Computer science 500 Natural sciences and mathematics 570 Life sciences 610 Medical sciences Medicine