From Points to Coalitions: Hierarchical Contrastive Shapley Values for Prioritizing Data Samples

Authors

  • Canran Xiao Shenzhen Campus of Sun Yat-sen University
  • Jiabao Dou Hong Kong Baptist University
  • Zhiming Lin Nankai University
  • Zong Ke National University of Singapore
  • Liwei Hou Hunan University

DOI:

https://doi.org/10.1609/aaai.v40i19.38633

Abstract

How should we quantify the value of each training example when datasets are large, heterogeneous, and geometrically structured? Classical Data-Shapley answers in principle, but its O(n!) complexity and point-wise perspective are ill-suited to modern scales. We propose Hierarchical Contrastive Data Valuation (HCDV), a three-stage framework that (i) learns a contrastive, geometry-preserving representation, (ii) organizes the data into a balanced coarse-to-fine hierarchy of clusters, and (iii) assigns Shapley-style pay-offs to coalitions via local Monte-Carlo games whose budgets are propagated downward. HCDV collapses the factorial burden to O(T∑ℓKℓ) = O(TKmax log n), rewards examples that sharpen decision boundaries, and regularizes outliers through curvature-based smoothness. We prove that HCDV approximately satisfies the four Shapley axioms with surplus loss O(η log n), enjoys sub-Gaussian coalition deviation Õ(1/√T), and incurs at most kε∞ regret for top-k selection. Experiments on four benchmarks — tabular, vision, streaming, and a 45 M-sample CTR task — plus the OpenDataVal suite show that HCDV lifts accuracy by up to +5 pp, slashes valuation time by up to 100×, and directly supports tasks such as augmentation filtering, low-latency streaming updates, and fair marketplace payouts.

Published

2026-03-14

How to Cite

Xiao, C., Dou, J., Lin, Z., Ke, Z., & Hou, L. (2026). From Points to Coalitions: Hierarchical Contrastive Shapley Values for Prioritizing Data Samples. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 15995-16003. https://doi.org/10.1609/aaai.v40i19.38633

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III