(1)

Kim, G.; Seo, M. State-Space Hierarchical Compression With Gated Attention and Learnable Sampling for Hour-Long Video Understanding in Large Multimodal Models. AAAI 2026, 40, 5656-5664.