TY  - GEN
TI  - Generalised Medical Object Detection via Self-Configuring Method Design
A1  - Baumgartner, Michael Anton
ID  - heidok37068
Y1  - 2025///
CY  - Heidelberg
AV  - public
N2  - The rich information in medical images fuels the need for an increasing number of acquisitions, ultimately resulting in a high workload for clinicians. Finding relevant structures in these images resembles a needle in the haystack problem, and when conducted manually, it is an error-prone and time-consuming process. Computer Aided Diagnosis (CAD) tools offer an alternative and can help to alleviate the current burden by speeding up clinical workflows and assisting with a second opinion. Diagnostic tasks, building the backbone of daily clinical routines, require the fast and accurate identification of critical structures like (1) vessel occlusions, which can potentially cause an Acute Ischemic Stroke, the second leading cause of death worldwide or (2) lung cancer manifesting as spherical structures, the leading cause for cancer-related death. Current work on medical image analysis predominantly focuses on semantic segmentation, which has shown great success for precise voxel-wise delineation of targets. However, diagnostic decision-making requires the correct localisation and classification of objects rather than voxels, which can not be accurately captured by semantic segmentation. Object detection methods can learn to identify objects in an end-to-end fashion, providing great utility by directly solving diagnostic tasks. Adoption in the domain is hampered by limited experience with these methods and complex configuration of application-specific parameters. This thesis explores the wide range of medical detection tasks and builds the foundation for future work in this important domain.

Three studies are presented to gain insights into important configuration choices of detection methods and highlight their versatility and competitiveness. The first case study revolves around an international challenge to tackle the detection of mediastinal lesions in Computed Tomography (CT) images. Despite its clinical relevance, no public benchmark was available to develop suitable methods for this anatomical region. Our solution based on a one-stage anchor-based detection model achieves an excellent Free-response Receiver Operating Characteristic (FROC) score of 0.9897, resulting in near-optimal sensitivity for this task. The submitted method achieved the third rank in the challenge, underlining the competitiveness of detection methods for diagnostic tasks.

The second study explores the quick and reliable identification of vessel occlusions in Computed Tomography Angiography (CTA) images. Instead of relying on hand-crafted heuristics and extensive pre-processing schemes that limit the applicability of current solutions, our method can detect an arbitrary number of occlusions without restrictions on certain vessels. Our study includes three cohorts, two of which were collected in a pseudo-prospective manner from external hospitals to evaluate the real-world impact of our method. The proposed method achieves high sensitivity (?81%) and negative predictive value (?93%) on these cohorts, highlighting its clinical utility for identifying
patients at risk. Qualitative inspection revealed the ability to find High-grade Stenosis (HGS), which were not labelled within the training cohort but constitute clinically relevant findings. We compared our method against two commercially available CE-marked and FDA-approved software solutions and demonstrated significant improvements over these, especially for the difficult to detect Medium Vessel Occlusions (MeVOs). Our solution is publicly available via a web platform: https://stroke.ccibonn.ai.

Thirdly, the feasibility of different detection models for the medical domain is explored. Detection Transformer models do not rely on additional proxy formulations with prior anchor boxes and offer direct set prediction capabilities, bypassing the requirement for manual heuristics during training and inference. Our study explores the utility of these models for diagnostic tasks by comparing the performance of three direct set prediction models with varying complexity against a strong anchor-based detection baseline. Two simpler designs, using single-scale information, are not able to compete with anchor-based approaches while the more complex model, using multi-scale deformable attention, performs on par with or better than the baseline. 

Based on a newly established development pool consisting of ten data sets and equipped with the experiences from the initial three case studies, we developed the first generalising detection method, nnDetection. Following nnU-Net?s design principles, we systematise the configuration process of medical detection methods by identifying rule-based, fixed and empirical design choices. It distils the knowledge from hundreds of experiments and several years of experience into a self-configuring method design. We build a unified framework to incorporate a heterogeneous set of object detection models based on single-stage, two-stage and direct-set prediction designs. To offer the best possible utility of our method, models with box-level and a combination of box- and voxel-level supervision are incorporated to handle diverse annotation types. We evaluate our method on nine previously unseen detection tasks, introducing new modalities, anatomical regions and object structures. nnDetection outperforms two baselines and five ablation models on this diverse pool of tasks. Additionally, we compare the generalising design of our method on three benchmarking data sets against current task-specific detection solutions and show that nnDetection achieves state-of-the-art results. Our work establishes a standardised baseline and easy entry point in the detection domain to catalyse future research. It democratises the availability of volumetric detection methods by offering out-of-the-box applicability to new data sets without requiring expert knowledge.

In summary, the work in this thesis has the potential to revolutionise the field of medical object detection by establishing a new development paradigm aimed at designing a generalising method. Our experiments on manually configuring detection methods demonstrate the utility and superiority of the proposed approaches over existing solutions. By distilling our findings into a self-configuring method, we make our knowledge available to the entire community and build the foundation for the next generation of medical detection methods. We have already leveraged the capabilities of nnDetection to compete in several international challenges with great success: ADAM 2020 (first rank detection track), MELA 2022 (third rank), TDSC-ABUS 2023 (second rank detection track) and INSTED 2024 (first rank). The code release of a preliminary version of nnDetection has already attracted a lot of interest in the community and can be found under https://github.com/MIC-DKFZ/nnDetection.
UR  - https://archiv.ub.uni-heidelberg.de/volltextserver/37068/
ER  -