SemanticShift: Robust Semantic Watermarking for Large Language Models
DOI:
https://doi.org/10.1609/aaaiss.v8i1.42577Abstract
Large language models (LLMs) have raised increasing concerns around misinformation and plagiarism. Watermarking—embedding identifiable signals into generated text—offers a promising approach for detection. Semantic watermarking enhances robustness by leveraging meaning rather than surface-level token patterns. However, existing semantic techniques often require model retraining or operate within constrained semantic spaces, limiting control over watermark strength, robustness, and cross-lingual generalizability. We introduce SemanticShift, a training-free and semantically grounded watermarking method that injects signals during generation by computing semantic shifts of candidate tokens relative to preceding context, guided by a secret key. SemanticShift uses pre-trained embedding models and is tunable via hyperparameters, offering strong resistance to paraphrasing and syntactic variation. Experiments demonstrate state-of-the-art detection performance, with ROC-AUC > 0.99 on original text and up to 0.96 under strong paraphrasing—outperforming all prior training-free approaches and rivaling training-based methods. Notably, SemanticShift achieves superior accuracy and robustness on models like OPT and LLaMA, showcasing its applicability and effectiveness.Downloads
Published
2026-05-18
How to Cite
Li, M., & Tan, N. (2026). SemanticShift: Robust Semantic Watermarking for Large Language Models. Proceedings of the AAAI Symposium Series, 8(1), 455–464. https://doi.org/10.1609/aaaiss.v8i1.42577
Issue
Section
Machine Learning and Knowledge Engineering (MAKE 2026)