NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages

Lakshya Tomar; Vinayak Abrol; Puneet Agarwal

doi:10.1609/aaai.v40i31.39796

Authors

Lakshya Tomar RocketFrog AI
Vinayak Abrol Indraprastha Institute of Information Technology, Delhi
Puneet Agarwal RocketFrog AI

DOI:

https://doi.org/10.1609/aaai.v40i31.39796

Abstract

In this work, we argue that not all sequence-to-sequence tasks require the strong inductive biases of autoregressive (AR) models. Tasks like multilingual transliteration, code refactoring, grammatical correction or text normalization often rely on local dependencies where the full modeling capacity of AR models can be overkill, creating a trade-off between their high accuracy and high inference latency. While non-autoregressive (NAR) models offer speed, they typically suffer from hallucinations and poor length control. To explore this trade-off, we focus on the multilingual transliteration task in Indic languages and introduce NADIR, a novel NAR architecture designed to strike a balance between speed and accuracy. NADIR integrates a Differential Transformer and a Mixture-of-Experts mechanism, enabling it to robustly model complex character mappings without sequential dependencies. NADIR achieves over a 13× speed-up compared to the state-of-the-art AR baseline. It maintains a competitive mean Character Error Rate of 15.78%, compared to 14.44% for the AR model and 21.88% for a standard NAR equivalent. Importantly, NADIR reduces Repetition errors by 49.53%, Substitution errors by 24.45%, Omission errors by 32.92%, and Insertion errors by 16.87%. This work provides a practical blueprint for building fast and reliable NAR systems, effectively bridging the gap between AR accuracy and the demands of real-time, large-scale deployment.

NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information