18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      On using Product-Specific Schema.org from Web Data Commons: An Empirical Set of Best Practices

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Schema.org has experienced high growth in recent years. Structured descriptions of products embedded in HTML pages are now not uncommon, especially on e-commerce websites. The Web Data Commons (WDC) project has extracted schema.org data at scale from webpages in the Common Crawl and made it available as an RDF `knowledge graph' at scale. The portion of this data that specifically describes products offers a golden opportunity for researchers and small companies to leverage it for analytics and downstream applications. Yet, because of the broad and expansive scope of this data, it is not evident whether the data is usable in its raw form. In this paper, we do a detailed empirical study on the product-specific schema.org data made available by WDC. Rather than simple analysis, the goal of our study is to devise an empirically grounded set of best practices for using and consuming WDC product-specific schema.org data. Our studies reveal six best practices, each of which is justified by experimental data and analysis.

          Related collections

          Author and article information

          Journal
          27 July 2020
          Article
          2007.13829
          a979d78a-a4d4-4bb6-8304-5864a769ea25

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          8 pages, 3 tables, 6 figures, published in Workshop on Knowledge Graphs and E-Commerce at KDD 2020 (non-archival)
          cs.IR

          Information & Library science
          Information & Library science

          Comments

          Comment on this article