Principle Component Trees and Their Persistent Homology

Ben Kizaric; Daniel Pimentel-Alarcón

doi:10.1609/aaai.v38i12.29222

Authors

Ben Kizaric University of Wisconsin - Madison
Daniel Pimentel-Alarcón University of Wisconsin - Madison

DOI:

https://doi.org/10.1609/aaai.v38i12.29222

Keywords:

ML: Dimensionality Reduction/Feature Selection, DMKM: Data Compression, DMKM: Data Visualization & Summarization, ML: Clustering, ML: Graph-based Machine Learning, ML: Learning with Manifolds, ML: Representation Learning, ML: Unsupervised & Self-Supervised Learning

Abstract

Low dimensional models like PCA are often used to simplify complex datasets by learning a single approximating subspace. This paradigm has expanded to union of subspaces models, like those learned by subspace clustering. In this paper, we present Principal Component Trees (PCTs), a graph structure that generalizes these ideas to identify mixtures of components that together describe the subspace structure of high-dimensional datasets. Each node in a PCT corresponds to a principal component of the data, and the edges between nodes indicate the components that must be mixed to produce a subspace that approximates a portion of the data. In order to construct PCTs, we propose two angle-distribution hypothesis tests to detect subspace clusters in the data. To analyze, compare, and select the best PCT model, we define two persistent homology measures that describe their shape. We show our construction yields two key properties of PCTs, namely ancestral orthogonality and non-decreasing singular values. Our main theoretical results show that learning PCTs reduces to PCA under multivariate normality, and that PCTs are efficient parameterizations of intersecting union of subspaces. Finally, we use PCTs to analyze neural network latent space, word embeddings, and reference image datasets.

Principle Component Trees and Their Persistent Homology

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information