WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation

Authors

  • Xiaoqian Liu Northeastern University Shanghai Jiao Tong University
  • Xiyan Gui Huazhong University of Science and Technology Shanghai Jiao Tong University
  • Zhengkun Ge Northeastern University
  • Yuan Ge Northeastern University
  • Chang Zou Shanghai Jiaotong University
  • Jiacheng Liu Shanghai Jiaotong University
  • Zhikang Niu Shanghai Jiaotong University
  • Qixi Zheng Shanghai Jiaotong University
  • Chen Xu Harbin Engineering University
  • Xie Chen Shanghai Jiaotong University
  • Tong Xiao Northeastern University NiuTrans Research
  • Jingbo Zhu Northeastern University NiuTrans Research
  • Linfeng Zhang Shanghai Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i38.40490

Abstract

Flow matching-based generative models offer a principled approach to modeling continuous-time dynamics in speech generation. However, inference is often computationally expensive due to repeated neural network evaluations required by ODE solvers. We propose WaveEx, a training-free and plug-in acceleration framework which replaces portions of ODE integration with wavelet-guided extrapolation. By leveraging the multi-scale structure of latent trajectories, WaveEx predicts future states directly in the frequency domain without additional model evaluations or architectural changes. WaveEx consistently accelerates inference across diverse speech generation tasks. The gains are especially pronounced in tasks like speech synthesis (up to 5.73× speedup) and music generation (2.75×), where flow matching plays a central role in alignment modeling and dense ODE integration. Even in tasks with simpler input-output mappings such as speech enhancement (4.55×) and voice conversion (2.75×), WaveEx still achieves notable acceleration, demonstrating the robustness and generalizability of the approach. These results highlight wavelet-guided extrapolation as a lightweight and broadly applicable alternative to full ODE solving for flow matching-based speech generation.

Downloads

Published

2026-03-14

How to Cite

Liu, X., Gui, X., Ge, Z., Ge, Y., Zou, C., Liu, J., … Zhang, L. (2026). WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32177–32185. https://doi.org/10.1609/aaai.v40i38.40490

Issue

Section

AAAI Technical Track on Natural Language Processing III