TY - JOUR AU - Dalleiger, Sebastian AU - Vreeken, Jilles PY - 2020/04/03 Y2 - 2024/03/29 TI - Explainable Data Decompositions JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 04 SE - AAAI Technical Track: Machine Learning DO - 10.1609/aaai.v34i04.5780 UR - https://ojs.aaai.org/index.php/AAAI/article/view/5780 SP - 3709-3716 AB - <p>Our goal is to discover the components of a dataset, characterize <em>why</em> we deem these components, explain <em>how</em> these components are different from each other, as well as identify what properties they <em>share</em> among each other. As is usual, we consider regions in the data to be components if they show significantly different distributions. What is not usual, however, is that we parameterize these distributions with patterns that are informative for one or more components. We do so because these patterns allow us to characterize what is going on in our data as well as explain our decomposition.</p><p>We define the problem in terms of a regularized maximum likelihood, in which we use the Maximum Entropy principle to model each data component with a set of patterns. As the search space is large and unstructured, we propose the deterministic DISC algorithm to efficiently discover high-quality decompositions via an alternating optimization approach. Empirical evaluation on synthetic and real-world data shows that DISC efficiently discovers meaningful components and accurately characterises these in easily understandable terms.</p> ER -