Feature Sampling Based Unsupervised Semantic Clustering for Real Web Multi-View Content

Xiaolong Gong; Linpeng Huang; Fuwei Wang

doi:10.1609/aaai.v33i01.3301102

Authors

Xiaolong Gong Shanghai Jiao Tong University
Linpeng Huang Shanghai Jiao Tong University
Fuwei Wang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v33i01.3301102

Abstract

Real web datasets are often associated with multiple views such as long and short commentaries, users preference and so on. However, with the rapid growth of user generated texts, each view of the dataset has a large feature space and leads to the computational challenge during matrix decomposition process. In this paper, we propose a novel multi-view clustering algorithm based on the non-negative matrix factorization that attempts to use feature sampling strategy in order to reduce the complexity during the iteration process. In particular, our method exploits unsupervised semantic information in the learning process to capture the intrinsic similarity through a graph regularization. Moreover, we use Hilbert Schmidt Independence Criterion (HSIC) to explore the unsupervised semantic diversity information among multi-view contents of one web item. The overall objective is to minimize the loss function of multi-view non-negative matrix factorization that combines with an intra-semantic similarity graph regularizer and an inter-semantic diversity term. Compared with some state-of-the-art methods, we demonstrate the effectiveness of our proposed method on a large real-world dataset Doucom and the other three smaller datasets.

Feature Sampling Based Unsupervised Semantic Clustering for Real Web Multi-View Content

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription