Multi-Level Quantization of Stochastic Variational Inference based Bayesian Neural Networks

Wu, Yong

Preview

PDF, English
Download (15MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00037233
URN: urn:nbn:de:bsz:16-heidok-372336

Abstract

Bayesian Neural Networks (BNNs) integrate the representational power of standard neural networks with the uncertainty estimation capabilities of Bayesian Inference, offering a robust framework to address challenges such as overconfidence and overfitting. However, the inherent complexity of BNNs —due to the use of weight distributions— renders the process computationally intensive, thereby hindering their practical deployment, especially on edge devices. To overcome these challenges from the perspectives of edge deployment and inference speed, this thesis investigates a series of quantization strategies specifically tailored for BNNs. As the main benchmark we utilize a synthetic dataset for a regression task. This dataset specifically allows us to characterize both regression performance and uncertainty prediction of aleatoric and epistemic uncertainty. We first analyze the crucial role of input representation in the quantization process and introduce partition quantization based on thermometer coding to mitigate the impact of input quantization errors. Building on this foundation, we propose three quantization methods: one based on the quantization of values sampled from the distributions of the BNN, another leveraging the mean and variance of the utilized Gaussian distributions, and a combined strategy that integrates both approaches. Each method is evaluated based on its impact on model performance under a fixed bit-width setting. Subsequently, we conduct a deeper analysis of how varying quantization bit-widths influence accuracy and uncertainty estimation. Extensive experiments demonstrate that our proposed quantization techniques substantially reduce computational complexity while maintaining prediction reliability, underscoring their potential for achieving efficient and robust BNN deployment in real-world, resourceconstrained environments.

Document type:	Master's thesis
Supervisor:	Fröning, Prof. Dr. Holger
Place of Publication:	Heidelberg
Date of thesis defense:	2025
Date Deposited:	04 Sep 2025 10:58
Date:	2025
Faculties / Institutes:	Service facilities > Institut f. Technische Informatik (ZITI) Fakultät für Ingenieurwissenschaften > Dekanat der Fakultät für Ingenieurwissenschaften
DDC-classification:	004 Data processing Computer science
Collection:	Institute of Computer Engineering - Selected theses