%0 Generic
%9 Master thesis
%A Wu, Yong
%C Heidelberg
%D 2025
%F heidok:37233
%R 10.11588/heidok.00037233
%T Multi-Level Quantization of Stochastic Variational Inference based Bayesian Neural Networks
%U https://archiv.ub.uni-heidelberg.de/volltextserver/37233/
%X Bayesian Neural Networks (BNNs) integrate the representational power of standard neural networks with the uncertainty estimation capabilities of Bayesian Inference, offering a robust framework to address challenges such as overconfidence and overfitting. However, the inherent complexity of BNNs —due to the use of weight distributions— renders the process computationally intensive, thereby hindering their practical deployment, especially on edge devices. To overcome these challenges from the perspectives of edge deployment and inference speed, this thesis investigates a series of quantization strategies specifically tailored for BNNs.  As the main benchmark we utilize a synthetic dataset for a regression task. This dataset specifically allows us to characterize both regression performance and uncertainty prediction of aleatoric and epistemic uncertainty.  We first analyze the crucial role of input representation in the quantization process and introduce partition quantization based on thermometer coding to mitigate the impact of input quantization errors. Building on this foundation, we propose three quantization methods: one based on the quantization of values sampled from the distributions of the BNN, another leveraging the mean and variance  of the utilized Gaussian distributions, and a combined strategy that integrates both approaches. Each method is evaluated based on its impact on model performance under a fixed bit-width setting. Subsequently, we conduct a deeper analysis of how varying quantization bit-widths influence accuracy and uncertainty estimation.  Extensive experiments demonstrate that our proposed quantization techniques substantially reduce computational complexity while maintaining prediction reliability, underscoring their potential for achieving efficient and robust BNN deployment in real-world, resourceconstrained environments.