Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Aspects of Coherence for Entity Analysis

Heinzerling, Benjamin

[thumbnail of th.pdf]
Preview
PDF, English
Download (5MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

Natural language understanding is an important topic in natural language proces- sing. Given a text, a computer program should, at the very least, be able to under- stand what the text is about, and ideally also situate it in its extra-textual context and understand what purpose it serves. What exactly it means to understand what a text is about is an open question, but it is generally accepted that, at a minimum, un- derstanding involves being able to answer questions like “Who did what to whom? Where? When? How? And Why?”. Entity analysis, the computational analysis of entities mentioned in a text, aims to support answering the questions “Who?” and “Whom?” by identifying entities mentioned in a text. If the answers to “Where?” and “When?” are specific, named locations and events, entity analysis can also pro- vide these answers. Entity analysis aims to answer these questions by performing entity linking, that is, linking mentions of entities to their corresponding entry in a knowledge base, coreference resolution, that is, identifying all mentions in a text that refer to the same entity, and entity typing, that is, assigning a label such as Person to mentions of entities. In this thesis, we study how different aspects of coherence can be exploited to improve entity analysis. Our main contribution is a method that allows exploiting knowledge-rich, specific aspects of coherence, namely geographic, temporal, and entity type coherence. Geographic coherence expresses the intuition that entities mentioned in a text tend to be geographically close. Similarly, temporal coherence captures the intuition that entities mentioned in a text tend to be close in the tem- poral dimension. Entity type coherence is based in the observation that in a text about a certain topic, such as sports, the entities mentioned in it tend to have the same or related entity types, such as sports team or athlete. We show how to integrate features modeling these aspects of coherence into entity linking systems and esta- blish their utility in extensive experiments covering different datasets and systems. Since entity linking often requires computationally expensive joint, global optimi- zation, we propose a simple, but effective rule-based approach that enjoys some of the benefits of joint, global approaches, while avoiding some of their drawbacks. To enable convenient error analysis for system developers, we introduce a tool for visual analysis of entity linking system output. Investigating another aspect of co- herence, namely the coherence between a predicate and its arguments, we devise a distributed model of selectional preferences and assess its impact on a neural core- ference resolution system. Our final contribution examines how multilingual entity typing can be improved by incorporating subword information. We train and make publicly available subword embeddings in 275 languages and show their utility in a multilingual entity typing task

Document type: Dissertation
Supervisor: Strube, Prof. Dr. Michael
Date of thesis defense: 22 January 2019
Date Deposited: 11 Mar 2019 09:11
Date: 2019
Faculties / Institutes: Neuphilologische Fakultät > Institut für Computerlinguistik
DDC-classification: 004 Data processing Computer science
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative