Is a Picture Worth a Thousand Words? A Deep Multi-Modal Architecture for Product Classification in E-Commerce

Tom Zahavy; Abhinandan Krishnan; Alessandro Magnani; Shie Mannor

doi:10.1609/aaai.v32i1.11419

Is a Picture Worth a Thousand Words? A Deep Multi-Modal Architecture for Product Classification in E-Commerce

Authors

Tom Zahavy Technion
Abhinandan Krishnan Walmart Labs
Alessandro Magnani Walmart Labs
Shie Mannor Technion

DOI:

https://doi.org/10.1609/aaai.v32i1.11419

Keywords:

Multi Modality, E-Commerce, Deep Learning

Abstract

Classifying products precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models that can reduce the cost and time of human editors. In this paper, we propose a decision level fusion approach for multi-modal product classification based on text and image neural network classifiers. We train input specific state-of-the-art deep neural networks for each input source, show the potential of forging them together into a multi-modal architecture and train a novel policy network that learns to choose between them. Finally, we demonstrate that our multi-modal network improves classification accuracy over both networks on a real-world large-scale product classification dataset that we collected from Walmart.com. While we focus on image-text fusion that characterizes e-commerce businesses, our algorithms can be easily applied to other modalities such as audio, video, physical sensors, etc.

Downloads

Published

2018-04-27

How to Cite

Zahavy, T., Krishnan, A., Magnani, A., & Mannor, S. (2018). Is a Picture Worth a Thousand Words? A Deep Multi-Modal Architecture for Product Classification in E-Commerce. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11419

Download Citation

Issue

Vol. 32 No. 1 (2018): Thirty-Second AAAI Conference on Artificial Intelligence

Section

IAAI18 - Emerging

Is a Picture Worth a Thousand Words? A Deep Multi-Modal Architecture for Product Classification in E-Commerce

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information