eprintid: 29167
rev_number: 32
eprint_status: archive
userid: 5576
dir: disk0/00/02/91/67
datestamp: 2020-12-09 14:15:02
lastmod: 2020-12-17 14:19:35
status_changed: 2020-12-09 14:15:02
type: doctoralThesis
metadata_visibility: show
creators_name: Hosseini Jafari, Omid
title: Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis
subjects: ddc-004
divisions: i-110300
adv_faculty: af-13
abstract: Scene understanding is one of the most important problems in computer vision. It consists of many subtasks such as image classification for describing an image with one word, object detection for finding and localizing objects of interest in the image and assigning a category to each of them, semantic segmentation for assigning a category to each pixel of an image, instance segmentation for finding and localizing objects of interest and marking all the pixels belonging to each object, depth estimation for estimating the distance of each pixel in the image from the camera, etc. Each of these tasks has its advantages and limitations. These tasks have a common goal to achieve that is to understand and describe a scene captured in an image or a set of images. One common question is if there is any synergy between these tasks. Therefore, alongside single task approaches, there is a line of research on how to learn multiple tasks jointly.

In this thesis, we explore different subtasks of scene understanding and propose mainly deep learning-based approaches to improve these tasks. First, we propose a modular Convolutional Neural Network (CNN) architecture for jointly training semantic segmentation and depth estimation tasks. We provide a setup suitable to analyze the cross-modality influence between these tasks for different architecture designs.
Then, we utilize object detection and instance segmentation as auxiliary tasks for focusing on target objects in complex tasks of scene flow estimation and object 6d pose estimation.

Furthermore, we propose a novel deep approach for object co-segmentation which is the task of segmenting common objects in a set of images. 
Finally, we introduce a novel pooling layer that preserves the spatial information while capturing a large receptive field. This pooling layer is designed for improving the dense prediction tasks such as semantic segmentation and depth estimation.
date: 2020
id_scheme: DOI
id_number: 10.11588/heidok.00029167
ppn_swb: 1743142803
own_urn: urn:nbn:de:bsz:16-heidok-291675
date_accepted: 2020-11-24
advisor: HASH(0x55e0f7f66bb8)
language: eng
bibsort: HOSSEINIJAEXPLORINGS2020
full_text_status: public
place_of_pub: Heidelberg
citation:   Hosseini Jafari, Omid  (2020) Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis.  [Dissertation]     
document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/29167/1/thesis.pdf