eprintid: 30731 rev_number: 33 eprint_status: archive userid: 6282 dir: disk0/00/03/07/31 datestamp: 2021-11-22 11:41:09 lastmod: 2021-11-26 06:46:40 status_changed: 2021-11-22 11:41:09 type: doctoralThesis metadata_visibility: show creators_name: Milbich, Timo title: Visual Similarity and Representation Learning subjects: ddc-004 divisions: i-110400 divisions: i-110300 adv_faculty: af-11 keywords: Deep Learning, Computer Vision, Representation Learning, Similarity Learning, Metric Learning, Image Classification, Image Retrieval cterms_swd: Maschinelles Sehen cterms_swd: Bildverstehen cterms_swd: Mustererkennung abstract: Computer Vision aims to artificially mimic the visual reasoning capabilities of humans by using algorithms which, once deployed to mechanical agents and software tools, improve car and traffic safety, enable effective visual search on the World Wide Web, or increase productivity and quality in industrial production processes. Similar to the reasoning processes that are constantly occurring in our brains, such algorithms directly rely on abstract representations of the objects in the visually perceivable environment and beyond. Consequently, learning informative representations that allow to detect and recognize objects, and to evaluate image scenes is of paramount importance to almost all areas of computer vision. The quality of a learned object representation typically depends on certain properties such as invariance to image noise, e.g. uninformative background, and robustness to object rotation, translation, or occlusion. In addition, many applications require representations that enable comparisons, i.e. to determine how similar or dissimilar two objects are semantically. However, arguably the most challenging aspect of learning object representations is ensuring generalization to unseen objects, object variations, and environments. While on the first aspects a large corpus of similarity learning literature exists, the latter, i.e. the generalization of object representations, is still poorly understood and thus rarely addressed explicitly. In this thesis, we analyze the current field of similarity learning and identify properties of object representations that correlate well with their generalization performance. We leverage our findings and propose novel methods that improve current approaches to similarity learning, both in terms of data sampling and learning problem formulation. To this end, we introduce several training tasks that complement the prevailing paradigm of standard class-discriminative learning, which are eventually unified under the concept of Diverse Feature Aggregation. To optimally facilitate the optimization of similarity learning approaches, we replace the commonly used heuristic and predefined data sampling strategies with a learnable sampling policy that adapts to the training state of our model. Typically, similarity learning finds applications in supervised learning problems. However, due to more training data becoming available and annotation processes often being tedious or even infeasible, unsupervised learning settings have been of particular interest in recent years. In the second part of this thesis, we explore the effectiveness of similarity learning for obtaining informative representations without the need for training labels for both static images and video sequences. To enable learning, our approaches alternate between inferring data relations during training and refinement of our visual representations. In doing so, we resort to the classic divide-and-conquer principle: we decompose overall complex learning problems into feasible local subproblems whose solutions are subsequently consolidated to yield concerted, global representations. Throughout this work, we justify our contributions through rigorous analysis and strong model performance on standard benchmarks sets, often outperforming previous state-of-the-art results. date: 2021 id_scheme: DOI id_number: 10.11588/heidok.00030731 ppn_swb: 1778244998 own_urn: urn:nbn:de:bsz:16-heidok-307312 date_accepted: 2021-11-04 advisor: HASH(0x55e0f7f2f678) language: eng bibsort: MILBICHTIMVISUALSIMI2021 full_text_status: public place_of_pub: Heidelberg citation: Milbich, Timo (2021) Visual Similarity and Representation Learning. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/30731/1/thesis_timo_milbich.pdf