Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment Through Latent Acoustic Pattern Triggers

Authors

  • Liang Lin Institute of Information Engineering, Chinese Academy of Sciences
  • Miao Yu University of Science and Technology of China
  • Kaiwen Luo North China Electric Power University
  • Yibo Zhang Beijing University of Posts and Telecommunications
  • Lilan Peng Southwest Jiaotong University
  • Dexian Wang Chengdu University of Traditional Chinese Medicine
  • Xuehai Tang Institute of Information Engineering, Chinese Academy of Sciences
  • Yuanhe Zhang Beijing University of Posts and Telecommunications
  • Xikang Yang Institute of Information Engineering, Chinese Academy of Sciences
  • Zhenhong Zhou Nanyang Technological University
  • Kun Wang Nanyang Technological University
  • Yang Liu Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v40i38.40472

Abstract

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio’s distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM’s acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate, (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack’s stealth.

Published

2026-03-14

How to Cite

Lin, L., Yu, M., Luo, K., Zhang, Y., Peng, L., Wang, D., … Liu, Y. (2026). Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment Through Latent Acoustic Pattern Triggers. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32015–32023. https://doi.org/10.1609/aaai.v40i38.40472

Issue

Section

AAAI Technical Track on Natural Language Processing III