eprintid: 30168 rev_number: 14 eprint_status: archive userid: 6002 dir: disk0/00/03/01/68 datestamp: 2021-07-15 06:16:50 lastmod: 2021-08-23 11:23:03 status_changed: 2021-07-15 06:16:50 type: doctoralThesis metadata_visibility: show creators_name: Dencker, Tobias title: Beyond Supervised Learning: Exploring Alternative Forms of Supervision for Visual Recognition subjects: ddc-004 divisions: i-110300 adv_faculty: af-11 cterms_swd: Computer Vision cterms_swd: Machine Learning cterms_swd: Object Detection abstract: The rise of deep learning has significantly advanced the field of computer vision. Deep learning, especially in combination with supervised learning, has become the backbone of most computer vision algorithms. However, labeled data is often the bottleneck of data-hungry deep learning methods that limits their performance and broader application. While large-scale annotation is expensive and time-consuming, humans know from experience that learning with limited supervision is possible. This thesis aims to study and devise computer vision algorithms that enable deep learning for tasks with limited supervision. Learning without full supervision requires exploiting alternative forms of supervision. A common approach is to use prior knowledge about the task to constrain the learning problem and compensate for the lack of supervision. This is difficult in the case of deep learning models, where an end-to-end learning paradigm largely prevents prior control over the representation learned by the model. Instead, this work pursues the idea of wrapping the supervised learning of a deep model into an iterative procedure that alternates between generating supervision from alternative sources and refining the model. To this end, models and learning procedures are developed and evaluated in the context of two computer vision applications for visual recognition with different variants of limited supervision. The first application deals with representation learning for human pose analysis from unsupervised video data. Visual recognition of human pose has many interesting applications but is challenging due to the high variability of pose and appearance as well as the problem of self-occlusion. This thesis employs a self-supervised learning approach by designing two auxiliary tasks that generate their own supervision to learn from spatiotemporal information in videos. To increase the robustness of the learning procedure, the approach exploits the inherent self-similarity of human motion for refining the generated supervision and creates a curriculum for learning that gradually increases in difficulty. It learns a meaningful representation of human pose that shows competitive performance to the state of the art. The second application addresses the challenge of reading cuneiform script in age-old clay tablets. To support Assyriologists in their analysis, a weakly supervised approach to train a cuneiform sign detector is proposed that can locate and classify cuneiform signs. Rather than requiring thousands of sign annotations, the approach incrementally learns by alternating between generating supervised data from weak supervision and training a cuneiform sign detector. To improve the precision and recall of the sign detector, the supervision generation implements an exploration-exploitation strategy that produces reliable and diverse examples for learning. The effectiveness of the proposed approach is thoroughly evaluated on the first large-scale dataset for cuneiform sign detection which is established as part of this thesis. Finally, this thesis investigates an approach for linguistic refinement to further improve the results of the trained sign detector. A text correction model is learned in a self-supervised fashion that combines the bottom-up information from the sign detections and the top-down information from the language encoded in cuneiform script. The approach demonstrates the first steps towards the automatic transliteration of cuneiform clay tablets from images. date: 2021 id_scheme: DOI id_number: 10.11588/heidok.00030168 ppn_swb: 1765565405 own_urn: urn:nbn:de:bsz:16-heidok-301681 date_accepted: 2021-06-30 advisor: HASH(0x55fc36c710a8) language: eng bibsort: DENCKERTOBBEYONDSUPE2021 full_text_status: public place_of_pub: Heidelberg citation: Dencker, Tobias (2021) Beyond Supervised Learning: Exploring Alternative Forms of Supervision for Visual Recognition. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/30168/1/thesis_tdencker.pdf