Preview |
PDF, English
- main document
Download (80MB) | Lizenz: Creative Commons Attribution 4.0
|
Abstract
Active Learning (AL) promises to reduce annotation costs by strategically selecting the most informative samples for labeling. However, a significant theory-practice gap has hindered its adoption in real-world applications, particularly in biomedical image segmentation where annotation costs are substantial. This thesis addresses this gap through rigorous evaluation methodology which also directly simulates practical use-case scenarios to obtain measurements of annotation savings in 3D medical imaging. This is done in several steps.
First, we formalize requirements for the evaluation of AL to ensure that the measurements allow for trustworthy insights with regard to the reduction in annotation effort that a practitioner can expect on his dataset, as well as a simplified decision framework based on the economic nature of AL. Then, we identify and formalize critical evaluation pitfalls that have hindered the practical application of AL research and create a comprehensive framework for evaluating deep learning-based AL. Our analysis shows that AL enables additional efficiency gains on well-optimized pipelines, but the absolute improvements over random baselines are smaller than on suboptimally optimized pipelines. Further, we show that an adequate evaluation of the benefits of AL must evaluate it in combination with orthogonal techniques such as self-supervised learning and hyperparameter optimization. Through systematic analysis of uncertainty estimation for semantic segmentation, we resolve key misconceptions in the field, demonstrating that ensemble methods provide superior epistemic uncertainty estimates that are in theory essential for AL, and clarifying that test-time augmentation models epistemic rather than aleatoric uncertainty as previously claimed. These insights directly inform query method design for segmentation tasks. Building on these foundations, we develop nnActive, a comprehensive framework for AL deployment in 3D biomedical segmentation that integrates best practices while introducing domain-specific adaptations including partial annotation strategies and improved random baselines. Through nnActive, we document previously unreported phenomena and find insufficient evidence supporting current uncertainty-based AL methods over improved random strategies. Finally, we develop ClaSP PE, an uncertainty-based query method combining class-stratified selection with scheduled power-noise injection, specifically designed to address common failure modes in biomedical segmentation. Critically, we validate ClaSP PE through roll-out evaluation on four held-out datasets across diverse anatomical structures and imaging modalities, explicitly simulating real-world deployment. This provides the strongest empirical evidence to date for practical AL effectiveness in 3D biomedical imaging, demonstrating consistent annotation reductions when AL is properly evaluated and carefully deployed. This thesis establishes that AL, while not a universal solution, is likely to deliver measurable annotation savings as a final optimization step in well-engineered pipelines, providing the evaluation principles, methodological insights, and practical tools necessary for evidence-based AL deployment in biomedical image segmentation.
| Document type: | Dissertation |
|---|---|
| Supervisor: | Maier-Hein, Prof. Dr. Klaus |
| Place of Publication: | Heidelberg |
| Date of thesis defense: | 20 April 2026 |
| Date Deposited: | 04 May 2026 14:09 |
| Date: | 2026 |
| Faculties / Institutes: | The Faculty of Mathematics and Computer Science > Department of Computer Science |
| DDC-classification: | 000 Generalities, Science 004 Data processing Computer science |
| Controlled Keywords: | Bildverarbeitung, Maschinelles Lernen, Informatik, Radiologie, Medizin |








