An Euclidean Distance Based on Tensor Product Graph Diffusion Related Attribute Value Embedding for Nominal Data Clustering

Lei Gu; Ningning Zhou; Yang Zhao

doi:10.1609/aaai.v32i1.11681

Authors

Lei Gu Nanjing University of Posts and Telecommunications
Ningning Zhou Nanjing University of Posts and Telecommunications
Yang Zhao Nanjing Forestry University

DOI:

https://doi.org/10.1609/aaai.v32i1.11681

Keywords:

Nominal data, Clustering

Abstract

Not like numerical data clustering, nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is the attribute value embedding, namely, transforming each nominal attribute value into a numerical vector. This embedding method consists of four steps. In the first step, the weights, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute based on each object and its k nearest neighbors. In the second step, an intra-attribute value similarity matrix is created for each nominal attribute by using the attribute value's weights. In the third step, for each nominal attribute, we find another attribute with the maximal dependence on it, and build an inter-attribute value similarity matrix on the basis of the attribute value's weights related to these two attributes. In the last step, a diffusion matrix of each nominal attribute is constructed by the tensor product graph diffusion process, and this step can cause the acquired value embedding to contain simultaneously the intra- and inter-attribute value similarities information. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.

An Euclidean Distance Based on Tensor Product Graph Diffusion Related Attribute Value Embedding for Nominal Data Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information