<> "The repository administrator has not yet configured an RDF license."^^ . <> . . "Eliminating flaws in biomedical image analysis validation"^^ . "The field of automatic biomedical image analysis substantially benefits from the rise of Artificial Intelligence (AI). Proper validation of those AI algorithms is, however, frequently neglected in favor of a strong focus on the development and exploration of new models. This research practice can, however, be risky since it may propagate poorly validated algorithms that could cause adverse outcomes for patients. Thus, a thorough and high-quality validation is crucial for any algorithm to potentially be used in clinical practice. This particularly holds true for biomedical image analysis competitions, so-called challenges, which have emerged as the state-of-the-art technique for comparative assessment of AI algorithms and determining which is the most effective in solving a certain research question. Challenges have strong implications. While challenge winners typically receive large monetary awards and are highly cited, the algorithm also stands a better chance of being translated into clinical practice. Given the tremendous importance of challenges, it is surprising that hardly any attention has so far been given to quality control.\r\n\r\nThe objective of the work presented in this thesis was to analyze common practice in challenges, to systematically reveal flaws in both challenges and general image analysis validation, to propose solutions to eliminate those issues and to improve general validation practice. Contributions related to the analysis of flaws and strategies for improvement are presented for four areas: challenge design, validation metrics, rankings, as well as reporting and result analysis. \r\n\r\nFirst, we demonstrate that challenges are highly heterogeneous yet not standardized, making it difficult to assess their overall quality. We further show that the research community is concerned about critical quality issues of challenges. The community eagerly asked for more quality control and best practice recommendations. Moreover, we evidence how effortlessly both challenge participants and organizers could, in theory, manipulate challenges by taking advantage of potential security holes in the challenge design. To compensate for this issue, we introduce a structured challenge submission system to collect comprehensive information about the challenge design, which can then be critically reviewed by independent referees. \r\n\r\nWe further demonstrate that validation metrics, the key measures in the assessment of AI algorithms, come with critical limitations that are often not taken into account during validation. In fact, researchers typically favor the use of common metrics without being aware of the numerous pitfalls pertaining to their use. An exhaustive list of metric-related pitfalls in the context of image-level classification, semantic segmentation, instance segmentation, and object detection tasks is provided in this thesis. To promote the selection of validation metrics based on their suitability to the underlying research problem rather than popularity, we propose a problem-driven metric recommendation framework that empowers researchers to make educated decisions while being made aware of the pitfalls to avoid. \r\n\r\nSince challenge rankings are an integral part of competitions, we place particular emphasis on analyzing the stability and robustness of rankings against changes in the ranking computation method. We demonstrate that rankings are typically unstable, meaning that an algorithm could win a challenge simply due to the nature of a ranking calculation scheme and not due to actually being the best fit for solving a particular research task. To enable uncertainty-based ranking analysis, we present an open-source toolkit that includes several analysis and advanced visualization techniques for challenges and general benchmarking experiments. \r\n\r\nFinally, the transparency of validation studies is one of the core elements of high-quality research and should thus be carefully considered. However, our analysis of the transparency and reproducibility of both challenge design and participating algorithms shows that this is often not the case, substantially decreasing the interpretability of challenge results. To facilitate and enhance challenge transparency, we present a guideline for challenge reporting. In addition, we introduce the concept of challenge registration, i.e. publishing the complete challenge design before execution. This concept is already successfully applied in clinical trials and increases the transparency and reliability of a challenge, as it makes substantial changes in the design traceable. Finally, we show that challenge results can be used for a dedicated strength-weakness analysis of participating algorithms, from which future algorithm development could heavily benefit in addressing unsolved issues. \r\n\r\nIn summary, this thesis uncovers several critical flaws in biomedical image analysis challenges and algorithm validation. In response, it also introduces several measures that have already proven their practice-changing impact and substantially increased the overall quality of challenges, especially for the well-known Medical Image Computing and Computer Assisted Interventions (MICCAI) and IEEE International Symposium on Biomedical Imaging (ISBI) conferences. The suggested advancements in challenge design promise to give rise to competitions with a higher level of reliability, interpretability, and trust. The overall findings and suggested improvements are not specific to challenges alone, but also generalize to the entire field of algorithm validation. The presented thesis thus paves the way for high-quality and thorough validation of AI algorithms, which is crucial to avoiding translating inefficient or clinically useless algorithms into clinical practice."^^ . "2023" . . . . . . . "Annika"^^ . "Reinke"^^ . "Annika Reinke"^^ . . . . . . "Eliminating flaws in biomedical image analysis validation (PDF)"^^ . . . "phd_thesis_reinke_bib.pdf"^^ . . . "Eliminating flaws in biomedical image analysis validation (Other)"^^ . . . . . . "indexcodes.txt"^^ . . . "Eliminating flaws in biomedical image analysis validation (Other)"^^ . . . . . . "lightbox.jpg"^^ . . . "Eliminating flaws in biomedical image analysis validation (Other)"^^ . . . . . . "preview.jpg"^^ . . . "Eliminating flaws in biomedical image analysis validation (Other)"^^ . . . . . . "medium.jpg"^^ . . . "Eliminating flaws in biomedical image analysis validation (Other)"^^ . . . . . . "small.jpg"^^ . . "HTML Summary of #32986 \n\nEliminating flaws in biomedical image analysis validation\n\n" . "text/html" . . . "000 Allgemeines, Wissenschaft, Informatik"@de . "000 Generalities, Science"@en . .