[1]

M. Zhao, “Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off”, AAAI, vol. 40, no. 41, pp. 34959–34967, Mar. 2026.