Clustering Documents Along Multiple Dimensions

Sajib Dasgupta; Richard Golden; Vincent Ng

doi:10.1609/aaai.v26i1.8325

Clustering Documents Along Multiple Dimensions

Authors

Sajib Dasgupta IBM Almaden Research Center
Richard Golden University of Texas at Dallas
Vincent Ng University of Texas at Dallas

DOI:

https://doi.org/10.1609/aaai.v26i1.8325

Keywords:

clustering, text mining, natural language processing

Abstract

Traditional clustering algorithms are designed to search for a single clustering solution despite the fact that multiple alternative solutions might exist for a particular dataset. For example, a set of news articles might be clustered by topic or by the author's gender or age. Similarly, book reviews might be clustered by sentiment or comprehensiveness. In this paper, we address the problem of identifying alternative clustering solutions by developing a Probabilistic Multi-Clustering (PMC) model that discovers multiple, maximally different clusterings of a data sample. Empirical results on six datasets representative of real-world applications show that our PMC model exhibits superior performance to comparable multi-clustering algorithms.

Downloads

Published

2021-09-20

How to Cite

Dasgupta, S., Golden, R., & Ng, V. (2021). Clustering Documents Along Multiple Dimensions. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 879–885. https://doi.org/10.1609/aaai.v26i1.8325

Download Citation

Issue

Vol. 26 No. 1 (2012): Twenty-Sixth AAAI Conference on Artificial Intelligence

Section

AAAI Technical Track: Machine Learning

Clustering Documents Along Multiple Dimensions

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information