Naming the Most Anomalous Cluster in Hilbert Space for Structures with Attribute Information

Authors

  • Janis Kalofolias CISPA Helmholtz Center for Information Security
  • Jilles Vreeken CISPA Helmholtz Center for Information Security

DOI:

https://doi.org/10.1609/aaai.v36i4.20323

Keywords:

Data Mining & Knowledge Management (DMKM), Machine Learning (ML)

Abstract

We consider datasets consisting of arbitrarily structured entities (e.g., molecules, sequences, graphs, etc) whose similarity can be assessed with a reproducing ker- nel (or a family thereof). These entities are assumed to additionally have a set of named attributes (e.g.: number_of_atoms, stock_price, etc). These attributes can be used to classify the structured entities in discrete sets (e.g., ‘number_of_atoms < 3’, ‘stock_price ≤ 100’, etc) and can effectively serve as Boolean predicates. Our goal is to use this side-information to provide explain- able kernel-based clustering. To this end, we propose a method which is able to find among all possible entity subsets that can be described as a conjunction of the available predicates either a) the optimal cluster within the Reproducing Kernel Hilbert Space, or b) the most anomalous subset within the same space. Our method works employs combinatorial optimisation via an adaptation of the Maximum-Mean-Discrepancy measure that captures the above intuition. Finally, we propose a criterion to select the optimal one out of a family of kernels in a way that preserves the available side-information. We provide several real world datasets that demonstrate the usefulness of our proposed method.

Downloads

Published

2022-06-28

How to Cite

Kalofolias, J., & Vreeken, J. (2022). Naming the Most Anomalous Cluster in Hilbert Space for Structures with Attribute Information. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 4057-4064. https://doi.org/10.1609/aaai.v36i4.20323

Issue

Section

AAAI Technical Track on Data Mining and Knowledge Management