TY  - GEN
AV  - public
Y1  - 2017/03/17/
TI  - The Web Data Commons Structured Data Extraction
ID  - heidok22891
EP  - 1
A1  - Primpeli, Anna
A1  - Meusel, Robert
A1  - Bizer, Christian
A1  - Stuckenschmidt, Heiner
UR  - https://archiv.ub.uni-heidelberg.de/volltextserver/22891/
N2  - More and more websites annotate their content using different markup formats. These annotations involve a large number of topics such as persons, events, products, hotels, organizations and cities. The purpose of embedding structured data in HTML pages is to make the content of those pages understandable to web applications. In this way, the retrieval and integration of data deriving from different web pages is greatly facilitated. The presented poster gives an overview of the Web Data Commons -  structured data project for the year 2016. The Web Data Commons project extracts structured data from the web corpus provided by Common Crawl, the largest public web corpus, and offers the extracted data for public download. In order to process these huge amounts of data, Web Data Commons builds upon its Extraction Framework and the Amazon Web Services.
ER  -