BiCycle: Group-wise Recursive Transformer Based on ASR Mechanism

Authors

  • Min Ho Jang Chung-Ang University
  • Eun Seo Seo Chung-Ang University
  • Jin Young Kim Chung-Ang University
  • Hyeongsoo Lim Chung-Ang University
  • Ji Won Yoon Chung-Ang University

DOI:

https://doi.org/10.1609/aaai.v40i37.40386

Abstract

Recursive transformer (RT) is a promising parameter-sharing technique for reducing computational burden of large-scale model. While RT has been successfully applied to large language models (LLMs), its effectiveness in automatic speech recognition (ASR) remains limited, despite the parallel trend of model scaling in the speech domain. In this paper, we reveal that conventional RT designs for LLMs are suboptimal for speech recognition, primarily because they do not fully consider the layer-wise specialization inherent in the ASR architecture, where lower layers focus on phonetic features and upper layers capture linguistic localization. To address this, we propose BiCycle, a novel RT scheme tailored for ASR. In particular, we firstly analyze attention patterns in a pre-trained ASR model to divide its layers into phonetic and linguistic groups. BiCycle then constructs an efficient RT model by transferring the pre-trained model’s weights in a step-wise manner and applies recursion separately to the phonetic and linguistic groups, preventing conflicts between their roles. Extensive experimental results confirm that the proposed method not only preserves the original ASR mechanism but also outperforms conventional RT approaches.

Downloads

Published

2026-03-14

How to Cite

Jang, M. H., Seo, E. S., Kim, J. Y., Lim, H., & Yoon, J. W. (2026). BiCycle: Group-wise Recursive Transformer Based on ASR Mechanism. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 31238–31246. https://doi.org/10.1609/aaai.v40i37.40386

Issue

Section

AAAI Technical Track on Natural Language Processing II