Principle Component Trees and Their Persistent Homology
DOI:
https://doi.org/10.1609/aaai.v38i12.29222Keywords:
ML: Dimensionality Reduction/Feature Selection, DMKM: Data Compression, DMKM: Data Visualization & Summarization, ML: Clustering, ML: Graph-based Machine Learning, ML: Learning with Manifolds, ML: Representation Learning, ML: Unsupervised & Self-Supervised LearningAbstract
Low dimensional models like PCA are often used to simplify complex datasets by learning a single approximating subspace. This paradigm has expanded to union of subspaces models, like those learned by subspace clustering. In this paper, we present Principal Component Trees (PCTs), a graph structure that generalizes these ideas to identify mixtures of components that together describe the subspace structure of high-dimensional datasets. Each node in a PCT corresponds to a principal component of the data, and the edges between nodes indicate the components that must be mixed to produce a subspace that approximates a portion of the data. In order to construct PCTs, we propose two angle-distribution hypothesis tests to detect subspace clusters in the data. To analyze, compare, and select the best PCT model, we define two persistent homology measures that describe their shape. We show our construction yields two key properties of PCTs, namely ancestral orthogonality and non-decreasing singular values. Our main theoretical results show that learning PCTs reduces to PCA under multivariate normality, and that PCTs are efficient parameterizations of intersecting union of subspaces. Finally, we use PCTs to analyze neural network latent space, word embeddings, and reference image datasets.Downloads
Published
2024-03-24
How to Cite
Kizaric, B., & Pimentel-Alarcón, D. (2024). Principle Component Trees and Their Persistent Homology. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13220-13229. https://doi.org/10.1609/aaai.v38i12.29222
Issue
Section
AAAI Technical Track on Machine Learning III