Unbiased Multivariate Correlation Analysis

Authors

  • Yisen Wang Tsinghua University
  • Simone Romano University of Melbourne
  • Vinh Nguyen University of Melbourne
  • James Bailey University of Melbourne
  • Xingjun Ma University of Melbourne
  • Shu-Tao Xia Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v31i1.10778

Keywords:

multivariate correlation measure, bias analysis, statistical model of independence, subspace clustering, outlier detection

Abstract

Correlation measures are a key element of statistics and machine learning, and essential for a wide range of data analysis tasks. Most existing correlation measures are for pairwise relationships, but real-world data can also exhibit complex multivariate correlations, involving three or more variables. We argue that multivariate correlation measures should be comparable, interpretable, scalable and unbiased. However, no existing measures satisfy all these requirements. In this paper, we propose an unbiased multivariate correlation measure, called UMC, which satisfies all the above criteria. UMC is a cumulative entropy based non-parametric multivariate correlation measure, which can capture both linear and non-linear correlations for groups of three or more variables. It employs a correction for chance using a statistical model of independence to address the issue of bias. UMC has high interpretability and we empirically show it outperforms state-of-the-art multivariate correlation measures in terms of statistical power, as well as for use in both subspace clustering and outlier detection tasks.

Downloads

Published

2017-02-13

How to Cite

Wang, Y., Romano, S., Nguyen, V., Bailey, J., Ma, X., & Xia, S.-T. (2017). Unbiased Multivariate Correlation Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10778