title: Cross-lingual Semantic Role Labeling through Translation and Multilingual Learning creator: Daza Arevalo, Jose Angel subject: ddc-004 subject: 004 Data processing Computer science subject: ddc-400 subject: 400 Linguistics subject: ddc-420 subject: 420 English subject: ddc-490 subject: 490 Other languages description: Understanding an event means being able to answer the question Who did what to whom? (and perhaps also how, when, where...). The what in this sentence is called an event, and it is directly linked to a predicate, which admits event-specific roles for participants that take part in the event. Semantic Role Labeling (SRL) is the task of assigning semantic argument structures to words or phrases in a sentence, which comprises the predicate, its sense, the participants, and the roles they play in the event or state of affairs. Nowadays the prevailing method for SRL is supervised learning, hence the quality of SRL systems is dependent on annotated training resources. In this thesis we address the problem of improving SRL performance for languages other than English. Given that annotation of SRL resources is time consuming, latest improvements on SRL have focused mainly on English; especially since the use of deep learning in Natural Language Processing (NLP) became the state-of-the-art (SOTA), annotated resources in other languages are not sufficient to compete with the latest improvements we witness for English. Earlier research has tried to address the lack of training resources in specific languages with bilingual annotation projection methods, or monolingual data augmentation approaches to generate more labeled data that can be later used to train a labeler. Instead, we explore in this work a novel and flexible Encoder-Decoder architecture for SRL that is robust enough to work with more than two languages at the same time, immediately benefiting from more available training data. We are the first to apply sequence transduction for monolingual and cross-lingual SRL, and show that the Encoder-Decoder architecture yields competitive performance with the sequence labeling approaches. Moreover, by capitalizing on existing Machine Translation (MT) research, our model is capable of learning to translate from English to other target languages and label predicates and semantic roles on the target side within a single inference step. We show that – similar to multi-source machine translation – the proposed architecture can profit from multiple input languages and knowledge learned during translation to improve labeling performance on the otherwise resource-poor target languages. We see potential for future development of this framework for diverse structured prediction tasks. In addition, this work addresses the long-standing problem of SRL annotation incompatibility across languages found in existing corpora; these divergences hinder the development of unified multilingual solutions for this task. To address and alleviate this problem, we define an automatic process for creating a new multilingual SRL corpus which is parallel, contains unified predicate senses and semantic roles across languages, and includes a manually validated test set on source and target sides. We demonstrate that this corpus is better suited than existing ones when used for joint multilingual training with neural models on lower-resource languages. Our work on this corpus is restricted to German, French, and Spanish as target languages; however, we see great potential to extend it to further languages. In short, we propose the first model that is capable of solving the SRL task in a single language, as well as performing cross-lingual SRL via joint translation and semantic argument structure labeling while resorting to high-quality MT. Additionally, our novel annotation projection method allows us to transfer existing annotations into new languages to create a densely labeled parallel cross-lingual SRL resource with human-validated test data. date: 2022 type: Dissertation type: info:eu-repo/semantics/doctoralThesis type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/31756/1/Daza_Thesis_Revised.pdf identifier: DOI:10.11588/heidok.00031756 identifier: urn:nbn:de:bsz:16-heidok-317560 identifier: Daza Arevalo, Jose Angel (2022) Cross-lingual Semantic Role Labeling through Translation and Multilingual Learning. [Dissertation] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/31756/ rights: info:eu-repo/semantics/openAccess rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html language: eng