Efficient Deep Learning at Inference Time for Gram Stained Image Classification

Kim, Hee E.

[thumbnail of dissertation_kim_2024.pdf]

Preview

PDF, English
Download (12MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00034902
URN: urn:nbn:de:bsz:16-heidok-349024

Abstract

Deep learning (DL) and artificial intelligence (AI) are woven into the fabric of our daily lives, and they also hold/have shown promise in the medical domain. Despite numerous studies published in the last decade regarding AI application in medicine, DL models have yet to be widely implemented in daily clinical practice on a large scale. In the face of numerous obstacles on the path to a thriving healthcare AI landscape, this dissertation focuses specifically on technical issues related to constrained hardware resources. To address this problem, in this doctoral thesis, I investigated and demonstrated optimal DL techniques based on the use case of Gram-stain analysis for microorganism identification.

Efficient DL techniques such as transfer learning, pruning and quantization can be employed during model training and deployment strategies should be considered in advance. Particularly, I advocate for applying transfer learning to pre-trained models as feature extractors, as opposed to introducing novel model architectures. For Gram-stain classification, DL models could be compressed and test-time performance could be accelerated without compromising test accuracy or loss. While pruning contributed to the reduction in model size by 15×, quantizing the bit representation from 32-bit to 8-bit led to accelerated inference times by 3×. Taking into the quantization configuration, the findings demonstrated that quantization per channel outperformed tensor-wise quantization for the majority of DL models. This outcome contradicts conventional assumptions, however, intensive quantization may potentially hinder the generalization of DL models. Therefore, the most optimal configuration of DL models should be empirically determined depending on the custom task and data. In the majority of setups, vision transformers (VT) exhibited superior model performance compared to convolutional neural networks (CNN). Notably, among these configurations, DeiT tiny emerged as the fastest VT model in int8 configuration, processing six images per second.

By harnessing the investigated efficient DL techniques including transfer learning, pruning and quantization, this doctoral research might provide valuable insights for AI researchers to accelerate the pace of innovation in the medical domain and pave the way for the seamless integration of AI into everyday healthcare practices.

Translation of abstract (German)

Deep Learning (DL) und Künstliche Intelligenz (KI) sind fester Bestandteil unseres täglichen Lebens und sind im medizinischen Bereich vielversprechend. Dennoch wurden die im medizinischen Bereich veröffentlichten Studien bisher noch nicht im großen Umfang in die tägliche klinische Praxis umgesetzt. Angesichts zahlreicher Hindernisse auf dem Weg zu einer blühenden KI-Landschaft im Gesundheitswesen konzentriert sich diese Dissertation speziell auf technische Probleme im Zusammenhang mit begrenzten Hardware-Ressourcen. Um dieses Problem anzugehen, habe ich in dieser Doktorarbeit optimale Deep Learning-Techniken untersucht und dargestellt, basierend auf dem Anwendungsfall der Gram-Färbung-Analyse zur Identifizierung von Mikroorganismen.

Effiziente DL-Techniken wie Transferlernen, Pruning und Quantisierung können während der Modell-Trainingsphase nutzen und frühzeitig Einsatzstrategien in Betracht ziehen werden. Insbesondere befürworte Ich die Anwendung des Transferlernens auf vorab trainierten Modellen in Form von Merkmalsextraktoren, anstatt neue Modellarchitekturen einzuführen. Für die Klassifizierung von Gram-Färbungen könnten DL-Modelle komprimiert und die Testzeit-Performance beschleunigt werden, ohne die Testgenauigkeit oder den Verlust zu beeinträchtigen. Während das Pruning zur Verringerung der Modellgröße um das 15-fache beitrug, führte die Quantisierung der Bit-Repräsentation von 32 Bit auf 8 Bit zu beschleunigten Inferenzzeiten um das 3-fache. Unter Berücksichtigung der Quantisierungskonfiguration ergaben die Ergebnisse, dass die Quantisierung pro Kanal die Quantisierung pro Tensor für die Mehrheit der DL-Modelle übertraf, unabhängig davon, ob es sich um vision transformer (VT) oder convolutional neural networks (CNN) handelte. Dieses Ergebnis steht im Gegensatz zur gängigen Annahme, nichtsdestotrotz könnte die intensive Quantisierung die Verallgemeinerung von DL-Modellen für die Gram-Färbung-Klassifizierung potenziell behindern. Daher sollte die optimale Konfiguration von DL-Modellen abhängig von der individuellen Aufgabe und den Daten empirisch bestimmt werden. In den meisten Einstellungen wiesen vision transformer (VT) eine überlegenere Modellleistung im Vergleich zu convolutional neural networks (CNN) auf. Besonders hervorzuheben ist, dass unter diesen Konfigurationen DeiT tiny als das schnellste VT-Modell in der int8-Konfiguration hervorging und sechs Bilder pro Sekunde verarbeitete.

Durch die Nutzung effizienter DL-Techniken und die Ausarbeitung einer umfassenden Strategie für die Modellbereitstellung werden KI-Forscherinnen und -forscher das Tempo der Innovationen im medizinischen Bereich beschleunigen. Diese Beschleunigung wird voraussichtlich den Weg für die nahtlose Integration von KI in den alltäglichen Gesundheitspraktiken ebnen, wertvolle Unterstützung für medizinische Dienstleister bieten und eine entscheidende Rolle bei der Weiterentwicklung der Patientenversorgung spielen.

Document type:	Dissertation
Supervisor:	Ganslandt, Prof. Dr. med. Thomas
Place of Publication:	Heidelberg
Date of thesis defense:	28 May 2024
Date Deposited:	12 Jun 2024 10:18
Date:	2024
Faculties / Institutes:	Medizinische Fakultät Mannheim > Zentrum für Präventivmedizin und Digitale Gesundheit Baden-Württemberg
DDC-classification:	004 Data processing Computer science 610 Medical sciences Medicine
Controlled Keywords:	Efficient Deep Learning, Gram Stained Image Classification, Artificial Intelligence