Empirical Best Practices On Using Product-Specific Schema.org

Authors

  • Mayank Kejriwal University of Southern California
  • Ravi Kiran Selvam University of Southern California
  • Chien-Chun Ni Verizon Media
  • Nicolas Torzec Verizon Media

DOI:

https://doi.org/10.1609/aaai.v35i17.17816

Keywords:

E-commerce, Schema.org, Best Practices, Web Data Commons, Common Crawl, Markup, Big Data

Abstract

Schema.org has experienced high growth in recent years. Structured descriptions of products embedded in HTML pages are now not uncommon, especially on e-commerce websites. The Web Data Commons (WDC) project has extracted schema.org data at scale from webpages in the Common Crawl and made it available as an RDF `knowledge graph' at scale. The portion of this data that specifically describes products offers a golden opportunity for researchers and small companies to leverage it for analytics and downstream applications. Yet, because of the broad and expansive scope of this data, it is not evident whether the data is usable in its raw form. In this paper, we do a detailed empirical study on the product-specific schema.org data made available by WDC. Rather than simple analysis, the goal of our study is to devise an empirically grounded set of best practices for using and consuming WDC product-specific schema.org data. Our studies reveal five best practices, each of which is justified by experimental data and analysis.

Downloads

Published

2021-05-18

How to Cite

Kejriwal, M., Selvam, R. K., Ni, C.-C., & Torzec, N. (2021). Empirical Best Practices On Using Product-Specific Schema.org. Proceedings of the AAAI Conference on Artificial Intelligence, 35(17), 15452-15457. https://doi.org/10.1609/aaai.v35i17.17816

Issue

Section

IAAI Technical Track on AI Best Practices, Challenge Problems, Training AI Users