title: Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO) creator: Arnold, Matthias creator: Hessel, Lena subject: ddc-004 subject: 004 Data processing Computer science subject: ddc-020 subject: 020 Library and information sciences subject: ddc-370 subject: 370 Education subject: ddc-490 subject: 490 Other languages subject: ddc-700 subject: 700 The arts subject: ddc-950 subject: 950 General history of Asia Far East description: This paper introduces the project “Early Chinese Periodicals Online (ECPO)” [1]. ECPO joins several important digital collections of the early Chinese press and puts them into a single overarching framework. In a first phase, several databases on early women’s periodicals and entertainment publishing were created: “Chinese Women’s Magazines in the Late Qing and Early Republican Period” (WoMag), “Chinese Entertainment Newspapers” (Xiaobao), and databases hosted at the Academia Sinica in Taiwan. These systems approach the material in two ways: in the intensive approach all articles, images, advertisements, and related agents are recorded and assigned to a complete set of scanned pages, while in the extensive approach the main characteristic features of publications are stored. ECPO has begun to join these various materials in a second, ongoing phase of the project. Today, ECPO provides open access to 267 publications comprising over 280.000 pages of print. A key aspect is to make entire issues available, front-to-back, including illustrations, advertisements, and even blank pages. For 138 publications we also provide descriptions of individual items in Chinese with Pinyin transcription. These records also contain genre and column information, basic content analysis, as well as names and roles of agents associated with an item. Our new cross-database agent service allows us to manage the approximately 47.000 names recorded in WoMag and ECPO: a) merge identical names across databases, b) identify agents and assigning names to them, and c) link agent records to authority data (GND, VIAF, Wikidata). Besides creating a curated list of agents occurring in the publications, we also aim to add missing persons to authority files like the GND. One crucial aspect ECPO is full text capability. Unfortunately, OCR software cannot be used out-of-the-box, for a number of reasons: document analysis fails to recognize complex newspaper layout, character recognition fails when it faces emphasis marks next to characters, and recognized passages have to be grouped in the right semantic order. The paper will discuss approaches to further exploring and analyzing the knowledge hidden in these publications, together with efforts to open the collection’s data for re-use. We will demonstrate workflows in the Agents service and cross-database record curation. We also present results from a crowdsourced approach to newspaper segmentation to generate segments that can easier be OCRed. In addition, we introduce first ideas to create a module for encoding text in TEI and relate it to the database. publisher: Heidelberg University Press date: 2019 type: Conference Item type: info:eu-repo/semantics/conferenceObject type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/27325/3/Arnold_Hessel_2019.pdf identifier: DOI:10.11588/heidok.00027325 identifier: urn:nbn:de:bsz:16-heidok-273253 identifier: Arnold, Matthias ; Hessel, Lena (2019) Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO). [Conference Item] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/27325/ rights: info:eu-repo/semantics/openAccess rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html language: eng