title: Visual Representation Learning with Minimal Supervision
creator: Büchler, Uta
subject: ddc-004
subject: 004 Data processing Computer science
description: Computer vision intends to provide the human abilities of understanding and interpreting the visual surroundings to computers. An essential element to comprehend  the environment is to extract relevant information from complex visual data so that the desired task can be solved. For instance, to distinguish cats from dogs the feature 'body shape' is more relevant than 'eye color' or the 'amount  of legs'. In traditional computer vision it is conventional to develop handcrafted functions that extract specific low-level features such as edges from visual data. However, in order to solve a particular task satisfactorily we require a combination of several features. Thus, the approach of traditional computer vision has the disadvantage that whenever a new task is addressed, a developer needs to manually  specify all the features the computer should look for. For that reason, recent works have primarily focused on developing new algorithms that teach the computer  to autonomously detect relevant and task-specific features. Deep learning has been particularly successful for that matter. In deep learning, artificial neural networks automatically learn to extract informative features directly from visual data. The majority of developed deep learning strategies require a dataset with annotations which indicate the solution of the desired task. The main bottleneck is that creating such a dataset is very tedious and time-intensive considering that every sample needs to be annotated manually. This thesis presents new techniques that attempt to keep the amount of human supervision to a minimum while still  reaching satisfactory performances on various visual understanding tasks. In particular, this thesis focuses on self-supervised learning algorithms that train a neural network on a surrogate task where no human supervision is required. We create an artificial supervisory signal by breaking the order of visual patterns and asking the network to recover the original structure. Besides demonstrating the abilities of our model on common computer vision tasks such  as action recognition, we additionally apply our model to biomedical scenarios. Many research projects in medicine involve profuse manual processes that extend the duration of developing successful treatments. Taking the example of analyzing the motor function of neurologically impaired patients we show that our self-supervised method can help to automate tedious, visually based processes in medical research. In order to perform a detailed analysis of motor behavior and, thus, provide a suitable treatment, it is important to discover and identify the negatively affected movements. Therefore, we propose a magnification tool that  can detect and enhance subtle changes in motor function including motor behavior differences across individuals. In this way, our automatic diagnostic system does not only analyze apparent behavior but also facilitates the perception and discovery of impaired movements. Learning a feature representation without requiring annotations significantly  reduces human supervision. However, using annotated dataset leads generally to better performances in contrast to self-supervised learning methods. Hence, we  additionally examine semi-supervised approaches which efficiently combine few annotated samples with large unlabeled datasets. Consequently, semi-supervised learning represents a good trade-off between annotation time and accuracy.
date: 2021
type: Dissertation
type: info:eu-repo/semantics/doctoralThesis
type: NonPeerReviewed
format: application/pdf
identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/29205/1/Buechler_VisualRepresentationLearningWithMinimalSupervision.pdf
identifier: DOI:10.11588/heidok.00029205
identifier: urn:nbn:de:bsz:16-heidok-292054
identifier:   Büchler, Uta  (2021) Visual Representation Learning with Minimal Supervision.  [Dissertation]     
relation: https://archiv.ub.uni-heidelberg.de/volltextserver/29205/
rights: info:eu-repo/semantics/openAccess
rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html
language: eng