Bi, H., Yuan, Z., Jia, Z., Zhang, J., Li, C., Luo, P., … Zhang, J. (2026). F2RVLM: Boosting Fine-grained Fragment Retrieval for Multi-Modal Long-form Dialogue with Vision Language Model. Proceedings of the AAAI Conference on Artificial Intelligence, 40(17), 14493–14501. https://doi.org/10.1609/aaai.v40i17.38466