Fused Feature Representation Discovery for High-Dimensional and Sparse Data

Authors

  • Jun Suzuki NTT Communication Science Laboratories
  • Masaaki Nagata NTT Communication Science Laboratories

DOI:

https://doi.org/10.1609/aaai.v28i1.8935

Keywords:

Feature Representation Discovery, Semi-supervised Learning, Natural Language Processing

Abstract

The automatic discovery of a significant low-dimensional feature representation from a given data set is a fundamental problem in machine learning. This paper focuses specifically on the development of the feature representation discovery methods appropriate for high-dimensional and sparse data. We formulate our feature representation discovery problem as a variant of the semi-supervised learning problem, namely, as an optimization problem over unsupervised data whose objective is evaluating the impact of each feature with respect to modeling a target task according to the initial model constructed by using supervised data. The most notable characteristic of our method is that it offers a feasible processing speed even if the numbers of data and features are both in the millions or even billions, and successfully provides a significantly small number of feature sets, i.e., fewer than 10, that can also offer improved performance compared with those obtained with the original feature sets. We demonstrate the effectiveness of our method in experiments consisting of two well-studied natural language processing tasks.

Downloads

Published

2014-06-21

How to Cite

Suzuki, J., & Nagata, M. (2014). Fused Feature Representation Discovery for High-Dimensional and Sparse Data. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8935