Region-Based Optimization in Continual Learning for Audio Deepfake Detection

Authors

  • Yujie Chen School of Computer Science and Technology, Anhui University
  • Jiangyan Yi Department of Automation, Tsinghua University
  • Cunhang Fan School of Computer Science and Technology, Anhui University
  • Jianhua Tao Department of Automation, Tsinghua University Beijing National Research Center for lnformation Science and Technology, Tsinghua University
  • Yong Ren Institute of Automation, Chinese Academy of Sciences
  • Siding Zeng Institute of Automation, Chinese Academy of Sciences
  • Chu Yuan Zhang Department of Automation, Tsinghua University
  • Xinrui Yan Institute of Automation, Chinese Academy of Sciences
  • Hao Gu Institute of Automation, Chinese Academy of Sciences
  • Jun Xue School of Computer Science and Technology, Anhui University
  • Chenglong Wang Institute of Automation, Chinese Academy of Sciences
  • Zhao Lv School of Computer Science and Technology, Anhui University
  • Xiaohui Zhang Institute of Automation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i22.34535

Abstract

Rapid advancements in speech synthesis and voice conversion bring convenience but also new security risks, creating an urgent need for effective audio deepfake detection. Although current models perform well, their effectiveness diminishes when confronted with the diverse and evolving nature of real-world deepfakes. To address this issue, we propose a continual learning method named Region-Based Optimization (RegO) for audio deepfake detection. Specifically, we use the Fisher information matrix to measure important neuron regions for real and fake audio detection, dividing them into four regions. First, we directly fine-tune the less important regions to quickly adapt to new tasks. Next, we apply gradient optimization in parallel for regions important only to real audio detection, and in orthogonal directions for regions important only to fake audio detection. For regions that are important to both, we use sample proportion-based adaptive gradient optimization. This region-adaptive optimization ensures an appropriate trade-off between memory stability and learning plasticity. Additionally, to address the increase of redundant neurons from old tasks, we further introduce the Ebbinghaus forgetting mechanism to release them, thereby promoting the model’s ability to learn more generalized discriminative features. Experimental results show our method achieves a 21.3 percent improvement in EER over the state-of-the-art continual learning approach RWM for audio deepfake detection. Moreover, the effectiveness of RegO extends beyond the audio deepfake detection domain, showing potential significance in other tasks, such as image recognition.

Downloads

Published

2025-04-11

How to Cite

Chen, Y., Yi, J., Fan, C., Tao, J., Ren, Y., Zeng, S., … Zhang, X. (2025). Region-Based Optimization in Continual Learning for Audio Deepfake Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23651–23659. https://doi.org/10.1609/aaai.v39i22.34535

Issue

Section

AAAI Technical Track on Natural Language Processing I