Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data

Lizhong Ding; Zhi Liu; Yu Li; Shizhong Liao; Yong Liu; Peng Yang; Ge Yu; Ling Shao; Xin Gao

doi:10.1609/aaai.v33i01.33013454

Authors

Lizhong Ding Inception Institute of Artificial Intelligence
Zhi Liu University of Macau
Yu Li King Abdullah University of Science and Technology
Shizhong Liao Tianjin University
Yong Liu Chinese Academy of Sciences
Peng Yang King Abdullah University of Science and Technology
Ge Yu Chinese Academy of Sciences
Ling Shao Inception Institute of Artificial Intelligence
Xin Gao King Abdullah University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v33i01.33013454

Abstract

We propose a framework for analyzing and comparing distributions without imposing any parametric assumptions via empirical likelihood methods. Our framework is used to study two fundamental statistical test problems: the two-sample test and the goodness-of-fit test. For the two-sample test, we need to determine whether two groups of samples are from different distributions; for the goodness-of-fit test, we examine how likely it is that a set of samples is generated from a known target distribution. Specifically, we propose empirical likelihood ratio (ELR) statistics for the two-sample test and the goodness-of-fit test, both of which are of linear time complexity and show higher power (i.e., the probability of correctly rejecting the null hypothesis) than the existing linear statistics for high-dimensional data. We prove the nonparametric Wilks’ theorems for the ELR statistics, which illustrate that the limiting distributions of the proposed ELR statistics are chi-square distributions. With these limiting distributions, we can avoid bootstraps or simulations to determine the threshold for rejecting the null hypothesis, which makes the ELR statistics more efficient than the recently proposed linear statistic, finite set Stein discrepancy (FSSD). We also prove the consistency of the ELR statistics, which guarantees that the test power goes to 1 as the number of samples goes to infinity. In addition, we experimentally demonstrate and theoretically analyze that FSSD has poor performance or even fails to test for high-dimensional data. Finally, we conduct a series of experiments to evaluate the performance of our ELR statistics as compared to state-of-the-art linear statistics.

Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information