%0 Generic
%A Sellner, Jan
%C Heidelberg
%D 2024
%F heidok:35083
%K hyperspectral imaging, hyperspectral tissue classification, organ segmentation, surgical scene segmentation, surgical data science, open surgery, deep learning, machine learning, domain generalization, geometrical domain shifts
%R 10.11588/heidok.00035083
%T Generalizable Surgical Scene Segmentation of Hyperspectral Images
%U https://archiv.ub.uni-heidelberg.de/volltextserver/35083/
%X Recently, it has been identified that complications following surgery contribute to the third leading cause of death globally. One of the significant challenges surgeons face is the visual discrimination of tissue types. Automatic surgical scene segmentation with hyperspectral imaging (HSI) could offer valuable assistance in this regard. However, the current state-of-the-art in this field has primarily focused on conventional RGB videos with limited spectral information, mostly from minimally invasive surgery, while HSI data and data obtained during open surgery have received little attention. Moreover, work in this area is constrained by small datasets, studies with only a few subjects or a limited number of tissue types. While deep learning-based scene segmentation is promising, it does not come without its own challenges. The generalizability of the models toward unknown data distributions, the robustness to variations in the surgical scene and the efficiency of the training process remain open questions. Consequently, the goal of this thesis is to overcome the problems in this field.    Firstly, we analyze the high-dimensional spectral information to gain a deeper understanding of the spectral characteristics and variability of different groups for various tissue types. Leveraging a tissue atlas of unprecedented size, which is comprised of 9057 images from 46 subjects annotated with 20 classes, we demonstrate that fully automatic tissue discrimination using a deep neural network is feasible with high accuracy of 95.4 % (standard deviation (SD) 3.6 %). We employ the principles of linear mixed model analysis to reveal that the most significant source of variability in spectral data is the tissue under observation rather than specific acquisition conditions. While recognizing the need within the HSI community for large open datasets, we make a portion of our data publicly available.    Secondly, it is necessary to train numerous networks during development to tackle a segmentation task. However, networks trained on HSI data are slow due to the large number of spectral channels which leads to data loading bottlenecks resulting in long training runs, low utilization of the graphics processing unit (GPU) and delayed inference. To address this, we are conducting a benchmark between various strategies to speed up the data loading including the introduction of a new concept to optimize the transferfrom the random-access memory (RAM) to the GPU. By combining all strategies, we achieve a speedup of up to 3.6 and nearly saturated GPU utilization.    Thirdly, equipped with an optimized training pipeline, we are tackling the task of robust surgical scene segmentation. Given the predominance of RGB data, we compare the benefit of HSI data to RGB data and to processed HSI data (e.g., tissue parameters like perfusion). The community has not converged to the optimal input representation of HSI data for a neural network which is why we explore the best input representation considering the spatial granularity of the input data (pixels vs. superpixels vs. patches vs. images). Through a comprehensive validation study involving 506 images from 20 subjects fully semantically annotated with 19 classes, we discover that HSI data outperforms RGB and processed HSI data across all spatial granularities. Moreover, the advantage of HSI increases with decreased spatial granularity. Our image HSI model consistently ranks first in our study achieving an average dice similarity coefficient (DSC) of 0.90 (SD 0.04). This segmentation score is on par with the inter-rater variability with an average DSC of 0.89 (SD 0.07).    Fourthly, even though machine learning models have proven to be powerful, they are also known to face generalization issues if applied to out-of-distribution (OOD) data. Therefore, we are conducting a generalizability assessment for the subject (variations induced by individuals), context (variations due to geometrical changes in the neighborhood) and species (variations when moving from one species to another) domain shifts. We find that the subject domain has only a minor impact on both the spectra and the imagelevel. On the other hand, contextual changes significantly deteriorate the segmentation performance with a drop of the DSC up to 0.48 (SD 0.38) revealing the struggles of neural networks with geometrical OOD data. To address this important bottleneck, we propose a simple, network-independent organ transplantation augmentation achieving a DSC of up to 0.91 (SD 0.10) bringing the segmentation performance on par with in-distribution data. This result is backed up through a validation study involving 600 fully semantically annotated images from 33 subjects and a comparison with other topology-aware augmentations where our proposed augmentation always ranks first. For the species domain, we utilize a large human dataset, comprising 777 images from 230 subjects fully semantically annotated with 16 classes, to demonstrate that segmentation on human data is more challenging than on porcine data and that the inclusion of porcine data in the training process offers no direct benefit.    In conclusion, we are the first to present fully semantic scene segmentation networks operating on HSI data that can differentiate between 19 classes occurring during open surgery, can be trained efficiently and are robust against contextual domain shifts. Our results are substantiated by extensive validation studies with several large datasets, some of which are publicly available as part of our open data efforts. Thereby, we made a valuable contribution to the broader goal of improving surgical interventions by leveraging the potential of HSI data with the power of machine learning algorithms. The code for all the experiments of this thesis as well as pretrained models are available at github.com/IMSY-DKFZ/htc.