SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Authors

  • Chen Li Sichuan University
  • Xiaoguang Yu JD AI Research
  • Shuangyong Song JD AI Research
  • Jia Wang JD AI Research
  • Bo Zou JD AI Research
  • Xiaodong He JD AI Research

DOI:

https://doi.org/10.1609/aaai.v36i11.21635

Keywords:

AI Architectures, Knowledge Representation, Machine Learning

Abstract

This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

Downloads

Published

2022-06-28

How to Cite

Li, C., Yu, X., Song, S., Wang, J., Zou, B., & He, X. (2022). SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12997-12998. https://doi.org/10.1609/aaai.v36i11.21635