PLMmark: A Secure and Robust Black-Box Watermarking Framework for Pre-trained Language Models

Authors

  • Peixuan Li Shanghai Jiao Tong University
  • Pengzhou Cheng Shanghai Jiao Tong University
  • Fangqi Li Shanghai Jiao Tong University
  • Wei Du Shanghai Jiao Tong University
  • Haodong Zhao Shanghai Jiao Tong University
  • Gongshen Liu Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v37i12.26750

Keywords:

General

Abstract

The huge training overhead, considerable commercial value, and various potential security risks make it urgent to protect the intellectual property (IP) of Deep Neural Networks (DNNs). DNN watermarking has become a plausible method to meet this need. However, most of the existing watermarking schemes focus on image classification tasks. The schemes designed for the textual domain lack security and reliability. Moreover, how to protect the IP of widely-used pre-trained language models (PLMs) remains a blank. To fill these gaps, we propose PLMmark, the first secure and robust black-box watermarking framework for PLMs. It consists of three phases: (1) In order to generate watermarks that contain owners’ identity information, we propose a novel encoding method to establish a strong link between a digital signature and trigger words by leveraging the original vocabulary tables of PLMs. Combining this with public key cryptography ensures the security of our scheme. (2) To embed robust, task-agnostic, and highly transferable watermarks in PLMs, we introduce a supervised contrastive loss to deviate the output representations of trigger sets from that of clean samples. In this way, the watermarked models will respond to the trigger sets anomaly and thus can identify the ownership. (3) To make the model ownership verification results reliable, we perform double verification, which guarantees the unforgeability of ownership. Extensive experiments on text classification tasks demonstrate that the embedded watermark can transfer to all the downstream tasks and can be effectively extracted and verified. The watermarking scheme is robust to watermark removing attacks (fine-pruning and re-initializing) and is secure enough to resist forgery attacks.

Downloads

Published

2023-06-26

How to Cite

Li, P., Cheng, P., Li, F., Du, W., Zhao, H., & Liu, G. (2023). PLMmark: A Secure and Robust Black-Box Watermarking Framework for Pre-trained Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 14991-14999. https://doi.org/10.1609/aaai.v37i12.26750

Issue

Section

AAAI Special Track on Safe and Robust AI