eprintid: 29167 rev_number: 32 eprint_status: archive userid: 5576 dir: disk0/00/02/91/67 datestamp: 2020-12-09 14:15:02 lastmod: 2020-12-17 14:19:35 status_changed: 2020-12-09 14:15:02 type: doctoralThesis metadata_visibility: show creators_name: Hosseini Jafari, Omid title: Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis subjects: ddc-004 divisions: i-110300 adv_faculty: af-13 abstract: Scene understanding is one of the most important problems in computer vision. It consists of many subtasks such as image classification for describing an image with one word, object detection for finding and localizing objects of interest in the image and assigning a category to each of them, semantic segmentation for assigning a category to each pixel of an image, instance segmentation for finding and localizing objects of interest and marking all the pixels belonging to each object, depth estimation for estimating the distance of each pixel in the image from the camera, etc. Each of these tasks has its advantages and limitations. These tasks have a common goal to achieve that is to understand and describe a scene captured in an image or a set of images. One common question is if there is any synergy between these tasks. Therefore, alongside single task approaches, there is a line of research on how to learn multiple tasks jointly. In this thesis, we explore different subtasks of scene understanding and propose mainly deep learning-based approaches to improve these tasks. First, we propose a modular Convolutional Neural Network (CNN) architecture for jointly training semantic segmentation and depth estimation tasks. We provide a setup suitable to analyze the cross-modality influence between these tasks for different architecture designs. Then, we utilize object detection and instance segmentation as auxiliary tasks for focusing on target objects in complex tasks of scene flow estimation and object 6d pose estimation. Furthermore, we propose a novel deep approach for object co-segmentation which is the task of segmenting common objects in a set of images. Finally, we introduce a novel pooling layer that preserves the spatial information while capturing a large receptive field. This pooling layer is designed for improving the dense prediction tasks such as semantic segmentation and depth estimation. date: 2020 id_scheme: DOI id_number: 10.11588/heidok.00029167 ppn_swb: 1743142803 own_urn: urn:nbn:de:bsz:16-heidok-291675 date_accepted: 2020-11-24 advisor: HASH(0x55e0f7f66bb8) language: eng bibsort: HOSSEINIJAEXPLORINGS2020 full_text_status: public place_of_pub: Heidelberg citation: Hosseini Jafari, Omid (2020) Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/29167/1/thesis.pdf