Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

Authors

  • Jian-Ping Mei Zhejiang University of Technology
  • Weibin Zhang Zhejiang University of Technology
  • Jie Chen Zhejiang University of Technology
  • Xuyun Zhang Macquarie University
  • Tiantian Zhu Zhejiang University of Technology

DOI:

https://doi.org/10.1609/aaai.v39i1.32041

Abstract

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Downloads

Published

2025-04-11

How to Cite

Mei, J.-P., Zhang, W., Chen, J., Zhang, X., & Zhu, T. (2025). Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 604–611. https://doi.org/10.1609/aaai.v39i1.32041

Issue

Section

AAAI Technical Track on Application Domains