Preview |
PDF, English
Download (4MB) | Terms of use |
Abstract
Reasoning involves drawing inferences from existing knowledge. As a fundamental aspect of human intelligence, related tasks like Natural Language Inference (NLI) and Commonsense Reasoning have been widely explored. While Large Language Models (LLMs) have shown impressive results on reasoning benchmarks, they struggle with complex reasoning tasks due to limited compositional generalization ability. This ability allows models to combine known components to solve novel problems, i.e., enabling them to tackle complex reasoning tasks by composing primitive reasoning knowledge without extensive training data.
However, limited prior work has explored the compositional generalization ability in reasoning. To address this gap, we make first steps to focus on \textit{Understanding and Improving the Compositional Generalization Abilities of LLMs in Reasoning}. We investigate the research question from two perspectives: (i) Whether and to what extent PLMs and LLMs perform compositional generalization in reasoning? (ii) How to improve the compositional generalization ability?
We introduce a novel benchmark comprising three related but independent systematicity tests in NLI, designed to provide a broad understanding of compositional generalization. Our findings reveal that models can handle unseen compositional inferences when they have prior knowledge of how to combine primitives while limited when such knowledge is absent. To address this, we demonstrate that exposing models to essential compositional knowledge through minimalistic examples can significantly enhance their capabilities.
Building on this foundational understanding, we explore more challenging scenarios to further investigate compositional generalization. Specifically, we introduce tasks where compositional inferences are both structurally and continually formed. Results highlight the limitations of existing methods in tackling these complex tasks. Inspired by traditional symbolic reasoning, we propose MORSE, a dynamic modularized reasoning model designed to handle structural compositional generalization. Results show its effectiveness in dynamic reasoning and its superior performance in compositional generalization. For continual learning scenarios, we focus on the ordering of primitives and compositional inference types. We show that continually learning subtasks, while considering their dependencies and increasing difficulty, enhances compositional generalization.
Lastly, we investigate the interplay between structural and continual compositional generalization. Specifically, we extend the structural compositional generalization and investigate this task using ordered primitives, inspired from continual learning. We find that different structural compositions pose varying levels of difficulty. Arranging demonstrations in an easy-to-hard sequence based on structural complexity further boosts the compositional generalization ability of LLMs.
In summary, we propose a series of tests to evaluate the compositional generalization abilities of PLMs and LLMs in challenging reasoning scenarios. Our findings reveal limitations in current models, uncovering their underlying causes and offering insights into their shortcomings. To address these issues, we suggest strategies for improvement, including refining model architectures, optimizing data organization, and enhancing training strategies. We aim to advance the development of neural models with more human-like compositional reasoning and generalization capabilities.
| Document type: | Dissertation |
|---|---|
| Supervisor: | Frank, Prof. Dr. Anette |
| Place of Publication: | Heidelberg |
| Date of thesis defense: | 28 February 2025 |
| Date Deposited: | 25 Nov 2025 11:51 |
| Date: | 2025 |
| Faculties / Institutes: | Neuphilologische Fakultät > Institut für Computerlinguistik |
| DDC-classification: | 004 Data processing Computer science |
| Controlled Keywords: | Compositional Generalization, Reasoning |







