[1]

G. Kim and M. Seo, “State-Space Hierarchical Compression with Gated Attention and Learnable Sampling for Hour-Long Video Understanding in Large Multimodal Models”, AAAI, vol. 40, no. 7, pp. 5656–5664, Mar. 2026.