Deep Generative Models for Image-to-Image Translation

Durall Lopez, Ricard

[thumbnail of Deep_Generative_Models_for_Attribute_Translation.pdf]

Preview

PDF, English - main document
Download (55MB) | Lizenz:

Creative Commons Attribution-NonCommercial 3.0 Germany

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00032344
URN: urn:nbn:de:bsz:16-heidok-323444
URL: http://www.ub.uni-heidelberg.de/archiv/32344

Abstract

The rise of artificial intelligence has significantly impacted the field of computer vision. In particular, deep learning has advanced the development of algorithms that comprehend visual data, and that can infer information about the environment, i.e., to mimic human vision. Among the wide variety of visual algorithms, in this thesis, we study and devise generative deep-learning models that enable image-to-image translation tasks, including style transfer and attribute manipulation. Such editing capacity might come in handy in those scenarios where additional data that contains certain properties is required, but is not available a priori, or it is quite restricted.

Over the last years, we have seen how data has become the new gold in many domains, as it has for deep-learning approaches. Indeed, the main Achilles' heel of these models is the ridiculous amount of labelled information that they crave. Therefore, we start this work by presenting a few-shot learning system that exploits alternative forms of supervision, successfully completing translation tasks with a very limited amount of samples. In this way, we open the door to less data-demanding image-to-image systems. A second focus of this thesis is the exploration and analysis of novel end-to-end models that incorporate inpainting modules to further improve their editing abilities. To that end, we assess different architectures and loss terms, together with semantic manipulations (label information) as well as with geometry manipulations (mask information), as input signal controls. The experimental evaluation of these scenarios allow us to gain insight into the role that the aforementioned elements might play when applying style and attribute modifications. Furthermore, we conduct a frequency spectrum analysis for both forged (deepfake) and generated images, paying attention to our image-to-image context as well. From this, we derive and discuss the effects that up-convolutional units might have on the final outcomes, such as artefacts in the high-frequency band. Last but not least, we present an image-to-image transformation system for a real-world application: identification of seismic events, such as diffraction and faults. The goal here is to combine two academic disciplines, i.e., computer vision and geophysics, into one project, drawing and integrating their knowledge to solve a given seismic problem.

Document type:	Dissertation
Supervisor:	Köthe, Prof. Dr. Ullrich
Place of Publication:	Heidelberg
Date of thesis defense:	3 November 2022
Date Deposited:	15 Nov 2022 14:18
Date:	2022
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Dean's Office of The Faculty of Mathematics and Computer Science The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science