Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

Yuan Sun; Wei Wang; Michael Kirley; Xiaodong Li; Jeffrey Chan

doi:10.1609/aaai.v34i04.6050

Authors

Yuan Sun RMIT University
Wei Wang University of Melbourne
Michael Kirley University of Melbourne
Xiaodong Li RMIT University
Jeffrey Chan RMIT University

DOI:

https://doi.org/10.1609/aaai.v34i04.6050

Abstract

Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information