From Points to Coalitions: Hierarchical Contrastive Shapley Values for Prioritizing Data Samples
DOI:
https://doi.org/10.1609/aaai.v40i19.38633Abstract
How should we quantify the value of each training example when datasets are large, heterogeneous, and geometrically structured? Classical Data-Shapley answers in principle, but its O(n!) complexity and point-wise perspective are ill-suited to modern scales. We propose Hierarchical Contrastive Data Valuation (HCDV), a three-stage framework that (i) learns a contrastive, geometry-preserving representation, (ii) organizes the data into a balanced coarse-to-fine hierarchy of clusters, and (iii) assigns Shapley-style pay-offs to coalitions via local Monte-Carlo games whose budgets are propagated downward. HCDV collapses the factorial burden to O(T∑ℓKℓ) = O(TKmax log n), rewards examples that sharpen decision boundaries, and regularizes outliers through curvature-based smoothness. We prove that HCDV approximately satisfies the four Shapley axioms with surplus loss O(η log n), enjoys sub-Gaussian coalition deviation Õ(1/√T), and incurs at most kε∞ regret for top-k selection. Experiments on four benchmarks — tabular, vision, streaming, and a 45 M-sample CTR task — plus the OpenDataVal suite show that HCDV lifts accuracy by up to +5 pp, slashes valuation time by up to 100×, and directly supports tasks such as augmentation filtering, low-latency streaming updates, and fair marketplace payouts.Downloads
Published
2026-03-14
How to Cite
Xiao, C., Dou, J., Lin, Z., Ke, Z., & Hou, L. (2026). From Points to Coalitions: Hierarchical Contrastive Shapley Values for Prioritizing Data Samples. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 15995-16003. https://doi.org/10.1609/aaai.v40i19.38633
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management III