[1]
T. Fu, M. Zhao, K. Niu, K. Peng, and B. Li, “OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding”, AAAI, vol. 40, no. 5, pp. 4031–4039, Mar. 2026.