Modulation-Based Backdoors: Leveraging Amplitude and Frequency Patterns to Attack Speaker Recognition

Authors

  • Hanbo Cai College of Computer Science and Software Engineering, Hohai University, Nanjing, Jiangsu, China College of Artificial Intelligence, Suzhou Vocational Institute of Industrial Technology, Suzhou, Jiangsu, China
  • Pengcheng Zhang College of Computer Science and Software Engineering, Hohai University, Nanjing, Jiangsu, China
  • Yan Xiao School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
  • De Li Computer Science and Engineering, Guangxi Normal University, Guilin, China
  • Hanting Chu School of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou, Zhejiang, China
  • Ying Luo College of Artificial Intelligence, Suzhou Vocational Institute of Industrial Technology, Suzhou, Jiangsu, China

DOI:

https://doi.org/10.1609/aaai.v40i1.36961

Abstract

Deep neural networks (DNNs) are widely and successfully applied in the field of speaker recognition. However, recent studies reveal that these models are vulnerable to backdoor attacks, where adversaries inject malicious behaviors into victim models by poisoning the training process. Existing attack methods often rely on environmental noise or complex voice transformations, which are typically difficult to implement and exhibit poor stealthiness. To address these issues, this paper proposes two modulation-based backdoor attacks that leverage frequency modulation (FM) and amplitude modulation (AM) to construct audio triggers. In real-world scenarios, regular variations in frequency and amplitude are often imperceptible to human listeners, making the proposed attacks more covert. Experimental results show that our methods achieve high attack success rates in both digital and physical settings, while also demonstrating strong resistance to various state-of-the-art backdoor defenses.

Downloads

Published

2026-03-14

How to Cite

Cai, H., Zhang, P., Xiao, Y., Li, D., Chu, H., & Luo, Y. (2026). Modulation-Based Backdoors: Leveraging Amplitude and Frequency Patterns to Attack Speaker Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 30–38. https://doi.org/10.1609/aaai.v40i1.36961

Issue

Section

AAAI Technical Track on Application Domains I