SemanticShift: Robust Semantic Watermarking for Large Language Models

Minghao Li; Neset Tan

doi:10.1609/aaaiss.v8i1.42577

Authors

Minghao Li University of Auckland
Neset Tan University of Auckland

DOI:

https://doi.org/10.1609/aaaiss.v8i1.42577

Abstract

Large language models (LLMs) have raised increasing concerns around misinformation and plagiarism. Watermarking—embedding identifiable signals into generated text—offers a promising approach for detection. Semantic watermarking enhances robustness by leveraging meaning rather than surface-level token patterns. However, existing semantic techniques often require model retraining or operate within constrained semantic spaces, limiting control over watermark strength, robustness, and cross-lingual generalizability. We introduce SemanticShift, a training-free and semantically grounded watermarking method that injects signals during generation by computing semantic shifts of candidate tokens relative to preceding context, guided by a secret key. SemanticShift uses pre-trained embedding models and is tunable via hyperparameters, offering strong resistance to paraphrasing and syntactic variation. Experiments demonstrate state-of-the-art detection performance, with ROC-AUC > 0.99 on original text and up to 0.96 under strong paraphrasing—outperforming all prior training-free approaches and rivaling training-based methods. Notably, SemanticShift achieves superior accuracy and robustness on models like OPT and LLaMA, showcasing its applicability and effectiveness.

SemanticShift: Robust Semantic Watermarking for Large Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information