eprintid: 37230 rev_number: 15 eprint_status: archive userid: 9251 dir: disk0/00/03/72/30 datestamp: 2025-09-04 11:03:57 lastmod: 2025-09-23 10:40:39 status_changed: 2025-09-04 11:03:57 type: masterThesis metadata_visibility: show creators_name: Bespalov, Sergej title: Reducing Global Memory Accesses in DNN Training using Structured Weight Masking subjects: ddc-004 divisions: fac-720000 divisions: i-160001 adv_faculty: af-19 abstract: Training large deep neural networks (DNNs) is often constrained by memory bandwidth, with frequent global memory accesses representing a significant performance bottleneck. This thesis investigates the potential of dynamic structured weight masking to alleviate this bottleneck during training, focusing on the ResMLP architecture—a feedforward network composed exclusively of Multi-Layer Perceptrons. A novel framework implementing block-wise masking based on L2 norm magnitude and top-k selection was developed and evaluated on the CIFAR-10 dataset. The study systematically varied block sizes and sparsity ratios, analyzing the impact on classification accuracy, theoretical computational cost (FLOPs), and theoretical memory movement. Results indicate that model accuracy remains robust up to approximately 50% sparsity when the mask is also applied during the backward pass; beyond this threshold, classification accuracy degradation is observed. Notably, larger blocks contribute to improved computational efficiency under masked backward conditions by offering hardware-friendly memory access patterns, whereas in unmasked backward passes, smaller blocks tend to perform more favorably in terms of maintaining accuracy. A key observation is the discrepancy between the substantial reduction in computationally active weights and the limited decrease in estimated memory movement, suggesting that tangible memory savings can only be achieved with hardware-aware implementations that bypass unnecessary data loads. Theoretical FLOPs decrease linearly with increasing sparsity, confirming the potential for computational efficiency gains. Overall, this work contributes an empirical analysis of dynamic structured weight masking in MLP-based architectures, offering insights into the trade-offs between mask ratio, block granularity, and training stability. The findings underscore the importance of co-designing masking patterns to achieve improvements in both computational cost and memory access, while also highlighting considerations for maintaining training stability. Furthermore, they provide practical guidelines for the efficient training of DNNs on systems with limited memory or computational resources. date: 2025 id_scheme: DOI id_number: 10.11588/heidok.00037230 ppn_swb: 1936556707 own_urn: urn:nbn:de:bsz:16-heidok-372307 date_accepted: 2025-06-10 advisor: HASH(0x561f65e34698) language: eng bibsort: BESPALOVSEREDUCINGGL2025 full_text_status: public place_of_pub: Heidelberg thesis_type: master citation: Bespalov, Sergej (2025) Reducing Global Memory Accesses in DNN Training using Structured Weight Masking. [Master's thesis] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/37230/1/Bespalov_Sergej.pdf