TY - GEN UR - https://archiv.ub.uni-heidelberg.de/volltextserver/35059/ TI - Towards a Unified Framework for Aspect-based Multi-document Text Summarization AV - public KW - Natural Language Processing KW - Text Summarization ID - heidok35059 N2 - For a growing number of knowledge workers, the rapid ingestion of textual information is crucial for their daily tasks. Confronted with expansive bodies of text, the fastest way to glean central pieces of information is usually a summary, condensing the most relevant points into a shorter ypiece of text. However, the manual curation of high-quality text summaries is a laborious and time-intensive task, requiring intense focus and attention. This motivates the central topic of this thesis: the automatic generation of textual summaries. Instead of relying on humans, we intend to summarize texts with the help of algorithms, designed to capture the central importance. Yet, despite decades of research into automatic text summarization systems, we are still not at a point where the resulting algorithms could provide the basis for a product that sees large-scale adoption by the general public. This thesis focuses on this obvious gap and provides a fundamental framework to address some of the remaining shortcomings in automatic text summarization systems. We investigate the direction of current research, and detail key challenges, which we divide into three central problems. 1) Modern neural network-based approaches to text summarization are extremely data-hungry, yet high-quality, task-speci?c data remains a scarce resource, particularly for languages besides English. 2) From a modeling perspective, we also point out that existing works over-index on narrow domains, such as news summarization, with an additional lack of inclusion of user-centric perspectives for summary generation. 3) We reiterate the lack of comprehensive and meaningful evaluations of text summarization systems. Where systemic comparisons nowadays rely on a singular ground truth and metric scores, subjective and nuanced differences in a summary should be included in more evaluations again. For all three of these focus areas?data, evaluation, and models?we work towards the elimination of remaining issues under a shared theoretical framework. We introduce two new datasets suitable for research purposes, enabling multilingual and domain-specific summarization applications, ensuring their quality standards with semi-automatic filtering techniques. To improve the utility of evaluations, we further provide an overview of failure cases in existing evaluation setups, and reiterate the necessity of focusing on truthful summary generation, by providing a metric for factuality-focused evaluation of generated summaries. Aggregating these insights from our investigation of existing limitations, we introduce a two-staged hybrid summarization model, combining a multi-aspect-oriented retrieval system with a similarly aspect-compatible re-writing module as a second stage. We hypothesize that this framework allows for a more user-centric experience for text summarization systems by enabling a customizable generation depending on user needs. The final two chapters focus on the practical consequences of such a two-staged model at the example of specific generation and retrieval aspects, and how these can be improved. A1 - Aumiller, Dennis Y1 - 2024/// CY - Heidelberg ER -