Analysis of Adversarial Examples

Lorenz, Peter

Preview

PDF, English - main document
Download (11MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00035211
URN: urn:nbn:de:bsz:16-heidok-352118

Abstract

The rise of artificial intelligence (AI) has significantly impacted the field of computer vision (CV). In particular, deep learning (DL) has advanced the development of algorithms that comprehend visual data. In specific tasks, DL exhibits human capabilities and is impacting our everyday lives such as virtual assistants, entertainment or web searches. Despite of the success of visual algorithms, in this thesis we study the threat adversarial examples, which are input manipulation to let to misclassifcation. The human vision system is not impaired and can classify the correct image, while for a DL classifier one pixel change is enough for misclassification. This is a misalignment between the human and CV system. Therefore, we start this work by presenting the concept of an classification model to understand how these models can be tricked by the threat model – adversarial examples. Then, we analyze the adversarial examples in the Fourier domain, because after this transformation they can be better identified for detection. To that end, we assess different adversarial attacks on various classification models and datasets deviating from the standard benchmarks As a complementary approach, we developed an anti-pattern utilizing a frame-like patch (prompt) on the input image to counteract the input manipulation. Instead of merely identifying and discarding adversarial inputs, this prompt neutralizes adversarial perturbations during testing. As another detection method, we expanded the use of a characteristics of multi-dimensional data – the local intrinsic dimensionality (LID) to differentiate between benign and attacked images, improving detection rates of adversarial examples. Recent advances in diffusion models (DMs) have significantly improved the robustness of adversarial models. Although DMs are well-known for their generative abilities, it remains unclear whether adversarial examples are part of the learned distribution of the DM. To address this gap, we propose a methodology that aims to determine whether adversarial examples are within the distribution of the learned manifold of the DM. We present an exploration of transforming adversarial images using the DM, which can reveal the attacked images.

Document type:	Dissertation
Supervisor:	Koethe, Prof. Dr. Ullrich
Place of Publication:	Heidelberg
Date of thesis defense:	30 July 2024
Date Deposited:	02 Aug 2024 06:38
Date:	2024
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science