Deep-learning based image synthesis with geometric models

Abu Alhaija, Hassan

[thumbnail of PhD_Dissertation_Hassan_Abu_Alhaija.pdf]

Preview

PDF, English
Download (19MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00030710
URN: urn:nbn:de:bsz:16-heidok-307108
URL: http://www.ub.uni-heidelberg.de/archiv/30710

Abstract

Datadriven machine learning approaches have made computer vision solutions more robust and easily adaptable to various circumstances. However, they are often limited by their dependency on large datasets with accurate groundtruth annotations for training. In most scene understanding tasks, like instance segmentation and object detection, training data is often scarce since annotations can not be measured directly from the real world using sensors but instead have to be manually created by humans at large cost. Virtual scenes could offer a feasible alternative in these cases since the full access to the underlying scene geometry enables generating fast and accurate annotations. However, scene understanding models trained on rendered images often do not perform well on real test images due to the difference in appearance between the synthetic and real images. This thesis proposes several new methods for images synthesis with focusing on generating training images that could partially or totally replace real data for training deep-learning models. It first explore the use of augmented reality techniques for combining synthetic 3D objects and real scenes. This can greatly reduce the effort needed for generating diverse training scenes with accurate annotations. We study and compare the effect of various factors of image generation on the performance of the trained scene understanding models. To overcome the limitations of rendering engines, we next propose a novel geometric image synthesis approach that generates geometrically consistent and controllable images. The deep neural network learns to imitate the rendering process while at the same time optimizing for an explicit realism objective making the resulting images more suitable to train scene understanding models. Finally, to alleviate the need for rendered images, we introduce an unsupervised neural rendering model trained only using unpaired 3D models and real images of similar object class. This is achieved by learning the forward rendering and backward decomposition processes jointly. The results in this thesis indicate that deep-learning based image synthesis models could be an efficient tool for generating realistic images and highquality synthetic training data.

Document type:	Dissertation
Supervisor:	Rother, Prof. Dr. Carsten
Place of Publication:	Heidelberg
Date of thesis defense:	29 October 2021
Date Deposited:	17 Nov 2021 10:10
Date:	2021
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science
Controlled Keywords:	Maschinelles Lernen, Künstliche Intelligenz
Uncontrolled Keywords:	Computer Vision, Machine Learning, Computer Graphics, Neural Rendering