SemanticShift: Robust Semantic Watermarking for Large Language Models

Authors

  • Minghao Li University of Auckland
  • Neset Tan University of Auckland

DOI:

https://doi.org/10.1609/aaaiss.v8i1.42577

Abstract

Large language models (LLMs) have raised increasing concerns around misinformation and plagiarism. Watermarking—embedding identifiable signals into generated text—offers a promising approach for detection. Semantic watermarking enhances robustness by leveraging meaning rather than surface-level token patterns. However, existing semantic techniques often require model retraining or operate within constrained semantic spaces, limiting control over watermark strength, robustness, and cross-lingual generalizability. We introduce SemanticShift, a training-free and semantically grounded watermarking method that injects signals during generation by computing semantic shifts of candidate tokens relative to preceding context, guided by a secret key. SemanticShift uses pre-trained embedding models and is tunable via hyperparameters, offering strong resistance to paraphrasing and syntactic variation. Experiments demonstrate state-of-the-art detection performance, with ROC-AUC > 0.99 on original text and up to 0.96 under strong paraphrasing—outperforming all prior training-free approaches and rivaling training-based methods. Notably, SemanticShift achieves superior accuracy and robustness on models like OPT and LLaMA, showcasing its applicability and effectiveness.

Downloads

Published

2026-05-18

How to Cite

Li, M., & Tan, N. (2026). SemanticShift: Robust Semantic Watermarking for Large Language Models. Proceedings of the AAAI Symposium Series, 8(1), 455–464. https://doi.org/10.1609/aaaiss.v8i1.42577

Issue

Section

Machine Learning and Knowledge Engineering (MAKE 2026)