Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Visual Similarity and Representation Learning

Milbich, Timo

[thumbnail of thesis_timo_milbich.pdf]
Preview
PDF, English - main document
Download (60MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

Computer Vision aims to artificially mimic the visual reasoning capabilities of humans by using algorithms which, once deployed to mechanical agents and software tools, improve car and traffic safety, enable effective visual search on the World Wide Web, or increase productivity and quality in industrial production processes. Similar to the reasoning processes that are constantly occurring in our brains, such algorithms directly rely on abstract representations of the objects in the visually perceivable environment and beyond. Consequently, learning informative representations that allow to detect and recognize objects, and to evaluate image scenes is of paramount importance to almost all areas of computer vision. The quality of a learned object representation typically depends on certain properties such as invariance to image noise, e.g. uninformative background, and robustness to object rotation, translation, or occlusion. In addition, many applications require representations that enable comparisons, i.e. to determine how similar or dissimilar two objects are semantically. However, arguably the most challenging aspect of learning object representations is ensuring generalization to unseen objects, object variations, and environments. While on the first aspects a large corpus of similarity learning literature exists, the latter, i.e. the generalization of object representations, is still poorly understood and thus rarely addressed explicitly. In this thesis, we analyze the current field of similarity learning and identify properties of object representations that correlate well with their generalization performance. We leverage our findings and propose novel methods that improve current approaches to similarity learning, both in terms of data sampling and learning problem formulation. To this end, we introduce several training tasks that complement the prevailing paradigm of standard class-discriminative learning, which are eventually unified under the concept of Diverse Feature Aggregation. To optimally facilitate the optimization of similarity learning approaches, we replace the commonly used heuristic and predefined data sampling strategies with a learnable sampling policy that adapts to the training state of our model. Typically, similarity learning finds applications in supervised learning problems. However, due to more training data becoming available and annotation processes often being tedious or even infeasible, unsupervised learning settings have been of particular interest in recent years. In the second part of this thesis, we explore the effectiveness of similarity learning for obtaining informative representations without the need for training labels for both static images and video sequences. To enable learning, our approaches alternate between inferring data relations during training and refinement of our visual representations. In doing so, we resort to the classic divide-and-conquer principle: we decompose overall complex learning problems into feasible local subproblems whose solutions are subsequently consolidated to yield concerted, global representations. Throughout this work, we justify our contributions through rigorous analysis and strong model performance on standard benchmarks sets, often outperforming previous state-of-the-art results.

Document type: Dissertation
Supervisor: Ommer, Prof. Dr. Björn
Place of Publication: Heidelberg
Date of thesis defense: 4 November 2021
Date Deposited: 22 Nov 2021 11:41
Date: 2021
Faculties / Institutes: The Faculty of Mathematics and Computer Science > Institut für Mathematik
The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification: 004 Data processing Computer science
Controlled Keywords: Maschinelles Sehen, Bildverstehen, Mustererkennung
Uncontrolled Keywords: Deep Learning, Computer Vision, Representation Learning, Similarity Learning, Metric Learning, Image Classification, Image Retrieval
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative