Variable Importance in High-Dimensional Settings Requires Grouping

Ahmad Chamma; Bertrand Thirion; Denis Engemann

doi:10.1609/aaai.v38i10.28997

Authors

Ahmad Chamma Inria-Saclay, Palaiseau, France Université Paris-Saclay CEA Saclay
Bertrand Thirion Inria-Saclay, Palaiseau, France Université Paris-Saclay CEA Saclay
Denis Engemann Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center Basel, F. Hoffmann–La Roche Ltd., Basel, Switzerland

DOI:

https://doi.org/10.1609/aaai.v38i10.28997

Keywords:

ML: Transparent, Interpretable, Explainable ML, ML: Classification and Regression, ML: Deep Learning Algorithms, ML: Dimensionality Reduction/Feature Selection, ML: Ensemble Methods

Abstract

Explaining the decision process of machine learning algorithms is nowadays crucial for both model’s performance enhancement and human comprehension. This can be achieved by assessing the variable importance of single variables, even for high-capacity non-linear methods, e.g. Deep Neural Networks (DNNs). While only removal-based approaches, such as Permutation Importance (PI), can bring statistical validity, they return misleading results when variables are correlated. Conditional Permutation Importance (CPI) bypasses PI’s limitations in such cases. However, in high-dimensional settings, where high correlations between the variables cancel their conditional importance, the use of CPI as well as other methods leads to unreliable results, besides prohibitive computation costs. Grouping variables statistically via clustering or some prior knowledge gains some power back and leads to better interpretations. In this work, we introduce BCPI (Block-Based Conditional Permutation Importance), a new generic framework for variable importance computation with statistical guarantees handling both single and group cases. Furthermore, as handling groups with high cardinality (such as a set of observations of a given modality) are both time-consuming and resource-intensive, we also introduce a new stacking approach extending the DNN architecture with sub-linear layers adapted to the group structure. We show that the ensuing approach extended with stacking controls the type-I error even with highly-correlated groups and shows top accuracy across benchmarks. Furthermore, we perform a real-world data analysis in a large-scale medical dataset where we aim to show the consistency between our results and the literature for a biomarker prediction.

Variable Importance in High-Dimensional Settings Requires Grouping

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription