DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays

Authors

  • Shandian Zhe Purdue University
  • Yuan Qi Purdue University
  • Youngja Park IBM Thomas J. Watson Research Center
  • Zenglin Xu University of Electronic Science and Technology of China
  • Ian Molloy IBM Thomas J. Watson Research Center
  • Suresh Chari IBM Thomas J. Watson Research Center

DOI:

https://doi.org/10.1609/aaai.v30i1.10222

Keywords:

Large Scale Tensor Decomposition, Multidimensional array analysis, Distributed Tensor Decomposition, Map-Reduce, Gaussian Process, Nonlinear Tensor Decomposition, Local Gaussian Process

Abstract

Tensor decomposition methods are effective tools for modelling multidimensional array data (i.e., tensors). Among them, nonparametric Bayesian models, such as Infinite Tucker Decomposition (InfTucker), are more powerful than multilinear factorization approaches, including Tucker and PARAFAC, and usually achieve better predictive performance. However, they are difficult to handle massive data due to a prohibitively high training cost. To address this limitation, we propose Distributed infinite Tucker (DinTucker), a new hierarchical Bayesian model that enables local learning of InfTucker on subarrays and global information integration from local results. We further develop a distributed stochastic gradient descent algorithm, coupled with variational inference for model estimation. In addition, the connection between DinTucker and InfTucker is revealed in terms of model evidence. Experiments demonstrate that DinTucker maintains the predictive accuracy of InfTucker and is scalable on massive data: On multidimensional arrays with billions of elements from two real-world applications, DinTucker achieves significantly higher prediction accuracy with less training time, compared with the state-of-the-art large-scale tensor decomposition method, GigaTensor.

Downloads

Published

2016-03-02

How to Cite

Zhe, S., Qi, Y., Park, Y., Xu, Z., Molloy, I., & Chari, S. (2016). DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10222

Issue

Section

Technical Papers: Machine Learning Methods