eprintid: 22891 rev_number: 9 eprint_status: archive userid: 2808 dir: disk0/00/02/28/91 datestamp: 2017-04-27 07:23:36 lastmod: 2017-05-04 08:43:37 status_changed: 2017-04-27 07:23:36 type: conferenceObject metadata_visibility: show creators_name: Primpeli, Anna creators_name: Meusel, Robert creators_name: Bizer, Christian creators_name: Stuckenschmidt, Heiner title: The Web Data Commons Structured Data Extraction subjects: ddc-004 subjects: ddc-020 divisions: i-704000 pres_type: poster cterms_swd: Markup Language cterms_swd: Structured data abstract: More and more websites annotate their content using different markup formats. These annotations involve a large number of topics such as persons, events, products, hotels, organizations and cities. The purpose of embedding structured data in HTML pages is to make the content of those pages understandable to web applications. In this way, the retrieval and integration of data deriving from different web pages is greatly facilitated. The presented poster gives an overview of the Web Data Commons - structured data project for the year 2016. The Web Data Commons project extracts structured data from the web corpus provided by Common Crawl, the largest public web corpus, and offers the extracted data for public download. In order to process these huge amounts of data, Web Data Commons builds upon its Extraction Framework and the Amazon Web Services. date: 2017-03-17 id_scheme: DOI id_number: 10.11588/heidok.00022891 collection: c-50 ppn_swb: 1657841049 own_urn: urn:nbn:de:bsz:16-heidok-228911 language: eng bibsort: PRIMPELIANTHEWEBDATA20170317 full_text_status: public pages: 1 event_title: E-Science-Tage 2017: Forschungsdaten managen event_location: Heidelberg University event_dates: 16-17 Mar 2017 citation: Primpeli, Anna ; Meusel, Robert ; Bizer, Christian ; Stuckenschmidt, Heiner (2017) The Web Data Commons Structured Data Extraction. [Conference Item] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/22891/1/est_poster_vice-uc_17-03-2017.pdf