title: Cross-lingual Semantic Role Labeling through Translation and Multilingual Learning
creator: Daza Arevalo, Jose Angel
subject: ddc-004
subject: 004 Data processing Computer science
subject: ddc-400
subject: 400 Linguistics
subject: ddc-420
subject: 420 English
subject: ddc-490
subject: 490 Other languages
description: Understanding an event means being able to answer the question Who did what to whom? (and perhaps also how, when, where...). The what in this sentence is called an event, and it is directly linked to a predicate, which admits event-specific roles for participants that take part in the event. Semantic Role Labeling (SRL) is the task of assigning semantic argument  structures to words or phrases in a sentence, which comprises the predicate, its sense, the participants, and the roles they play in the event or state of affairs.    Nowadays the prevailing method for SRL is supervised learning, hence the quality of SRL systems is dependent on annotated training resources. In this thesis we address the problem  of improving SRL performance for languages other than English. Given that annotation of SRL resources is time consuming, latest improvements on SRL have focused mainly on  English; especially since the use of deep learning in Natural Language Processing (NLP) became the state-of-the-art (SOTA), annotated resources in other languages are not sufficient  to compete with the latest improvements we witness for English.    Earlier research has tried to address the lack of training resources in specific languages with bilingual annotation projection methods, or monolingual data augmentation approaches to generate more labeled data that can be later used to train a labeler. Instead, we explore in  this work a novel and flexible Encoder-Decoder architecture for SRL that is robust enough to work with more than two languages at the same time, immediately benefiting from more  available training data. We are the first to apply sequence transduction for monolingual and cross-lingual SRL, and show that the Encoder-Decoder architecture yields competitive  performance with the sequence labeling approaches. Moreover, by capitalizing on existing Machine Translation (MT) research, our model is capable of learning to translate from English to other target languages and label predicates and semantic roles on the target side within a single inference step. We show that – similar to multi-source machine translation –  the proposed architecture can profit from multiple input languages and knowledge learned during translation to improve labeling performance on the otherwise resource-poor target  languages. We see potential for future development of this framework for diverse structured prediction tasks.    In addition, this work addresses the long-standing problem of SRL annotation incompatibility across languages found in existing corpora; these divergences hinder the development  of unified multilingual solutions for this task. To address and alleviate this problem, we define an automatic process for creating a new multilingual SRL corpus which is parallel,  contains unified predicate senses and semantic roles across languages, and includes a manually validated test set on source and target sides. We demonstrate that this corpus is better suited than existing ones when used for joint multilingual training with neural models on lower-resource languages. Our work on this corpus is restricted to German, French, and Spanish as target languages; however, we see great potential to extend it to further languages.    In short, we propose the first model that is capable of solving the SRL task in a single language, as well as performing cross-lingual SRL via joint translation and semantic argument structure labeling while resorting to high-quality MT. Additionally, our novel annotation  projection method allows us to transfer existing annotations into new languages to create a densely labeled parallel cross-lingual SRL resource with human-validated test data.
date: 2022
type: Dissertation
type: info:eu-repo/semantics/doctoralThesis
type: NonPeerReviewed
format: application/pdf
identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/31756/1/Daza_Thesis_Revised.pdf
identifier: DOI:10.11588/heidok.00031756
identifier: urn:nbn:de:bsz:16-heidok-317560
identifier:   Daza Arevalo, Jose Angel  (2022) Cross-lingual Semantic Role Labeling through Translation and Multilingual Learning.  [Dissertation]     
relation: https://archiv.ub.uni-heidelberg.de/volltextserver/31756/
rights: info:eu-repo/semantics/openAccess
rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html
language: eng