eprintid: 27919 rev_number: 18 eprint_status: archive userid: 4991 dir: disk0/00/02/79/19 datestamp: 2020-02-21 12:24:57 lastmod: 2020-02-27 11:03:19 status_changed: 2020-02-21 12:24:57 type: doctoralThesis metadata_visibility: show creators_name: Moosavi, Nafise Sadat title: Robustness in Coreference Resolution subjects: ddc-004 subjects: ddc-400 divisions: i-90500 adv_faculty: af-09 cterms_swd: Natural Language Processing cterms_swd: Coreference resolution cterms_swd: Robustness abstract: Coreference resolution is the task of determining different expressions of a text that refer to the same entity. The resolution of coreferring expressions is an essential step for automatic interpretation of the text. While coreference information is beneficial for various NLP tasks like summarization, question answering, and information extraction, state-of-the-art coreference resolvers are barely used in any of these tasks. The problem is the lack of robustness in coreference resolution systems. A coreference resolver that gets higher scores on the standard evaluation set does not necessarily perform better than the others on a new test set. In this thesis, we introduce robustness in coreference resolution by (1) introducing a reliable evaluation framework for recognizing robust improvements, and (2) proposing a solution that results in robust coreference resolvers. As the first step of setting up the evaluation framework, we introduce a reliable evaluation metric, called LEA, that overcomes the drawbacks of the existing metrics. We analyze LEA based on various types of errors in coreference outputs and show that it results in reliable scores. In addition to an evaluation metric, we also introduce an evaluation setting in which we disentangle coreference evaluations from parsing complexities. Coreference resolution is affected by parsing complexities for detecting the boundaries of expressions that have complex syntactic structures. We reduce the effect of parsing errors in coreference evaluation by automatically extracting a minimum span for each expression. We then emphasize the importance of out-of-domain evaluations and generalization in coreference resolution and discuss the reasons behind the poor generalization of state-of-the-art coreference resolvers. Finally, we show that enhancing state-of-the-art coreference resolvers with linguistic features is a promising approach for making coreference resolvers robust across domains. The incorporation of linguistic features with all their values does not improve the performance. However, we introduce an efficient pattern mining approach, called EPM, that mines all feature-value combinations that are discriminative for coreference relations. We then only incorporate feature-values that are discriminative for coreference relations. By employing EPM feature-values, performance improves significantly across various domains. date: 2020 id_scheme: DOI id_number: 10.11588/heidok.00027919 ppn_swb: 1691157325 own_urn: urn:nbn:de:bsz:16-heidok-279197 date_accepted: 2019-07-22 advisor: HASH(0x55fc36bd9310) language: eng bibsort: MOOSAVINAFROBUSTNESS2020 full_text_status: public place_of_pub: Heidelberg citation: Moosavi, Nafise Sadat (2020) Robustness in Coreference Resolution. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/27919/1/thesis.pdf