eprintid: 30731
rev_number: 33
eprint_status: archive
userid: 6282
dir: disk0/00/03/07/31
datestamp: 2021-11-22 11:41:09
lastmod: 2021-11-26 06:46:40
status_changed: 2021-11-22 11:41:09
type: doctoralThesis
metadata_visibility: show
creators_name: Milbich, Timo
title: Visual Similarity and Representation Learning
subjects: ddc-004
divisions: i-110400
divisions: i-110300
adv_faculty: af-11
keywords: Deep Learning, Computer Vision, Representation Learning, Similarity Learning, Metric Learning, Image Classification, Image Retrieval
cterms_swd: Maschinelles Sehen
cterms_swd: Bildverstehen
cterms_swd: Mustererkennung
abstract: Computer Vision aims to artificially mimic the visual reasoning capabilities of humans by using algorithms which, once deployed to mechanical agents and software tools, improve car and traffic safety, enable effective visual search on the World Wide Web, or increase productivity and quality in industrial production processes. Similar to the reasoning processes that are constantly occurring in our brains, such algorithms directly rely on abstract
representations of the objects in the visually perceivable environment and beyond. Consequently, learning informative representations that allow to detect and recognize objects,
and to evaluate image scenes is of paramount importance to almost all areas of computer vision. The quality of a learned object representation typically depends on certain properties such as invariance to image noise, e.g. uninformative background, and robustness to object rotation, translation, or occlusion. In addition, many applications require representations that enable comparisons, i.e. to determine how similar or dissimilar two objects are semantically. However, arguably the most challenging aspect of learning object representations is ensuring generalization to unseen objects, object variations, and environments. While on the first aspects a large corpus of similarity learning literature exists, the latter, i.e. the generalization of object representations, is still poorly understood and thus rarely addressed explicitly. In this thesis, we analyze the current field of similarity learning and identify properties of object representations that correlate well with their generalization performance. We
leverage our findings and propose novel methods that improve current approaches to similarity learning, both in terms of data sampling and learning problem formulation. To this end, we introduce several training tasks that complement the prevailing paradigm of standard class-discriminative learning, which are eventually unified under the concept of
Diverse Feature Aggregation. To optimally facilitate the optimization of similarity learning approaches, we replace the commonly used heuristic and predefined data sampling
strategies with a learnable sampling policy that adapts to the training state of our model. Typically, similarity learning finds applications in supervised learning problems. However, due to more training data becoming available and annotation processes often being tedious or even infeasible, unsupervised learning settings have been of particular interest in recent years. In the second part of this thesis, we explore the effectiveness of similarity learning for obtaining informative representations without the need for training labels for both static images and video sequences. To enable learning, our approaches alternate between inferring data relations during training and refinement of our visual representations. In doing so, we resort to the classic divide-and-conquer principle: we decompose overall complex learning problems into feasible local subproblems whose solutions are subsequently consolidated to yield concerted, global representations. Throughout this work, we justify our contributions through rigorous analysis and strong model performance on standard benchmarks sets, often outperforming previous state-of-the-art results.
date: 2021
id_scheme: DOI
id_number: 10.11588/heidok.00030731
ppn_swb: 1778244998
own_urn: urn:nbn:de:bsz:16-heidok-307312
date_accepted: 2021-11-04
advisor: HASH(0x5608d3c83430)
language: eng
bibsort: MILBICHTIMVISUALSIMI2021
full_text_status: public
place_of_pub: Heidelberg
citation:   Milbich, Timo  (2021) Visual Similarity and Representation Learning.  [Dissertation]     
document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/30731/1/thesis_timo_milbich.pdf