Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing

Authors

  • Yuanlu Xu University of California, Los Angeles
  • Xiaobai Liu San Diego State University
  • Lei Qin Chinese Academy of Sciences
  • Song-Chun Zhu University of California, Los Angeles

DOI:

https://doi.org/10.1609/aaai.v31i1.11190

Keywords:

multi-view tracking, human behavior analysis, joint inference, cluster sampling

Abstract

In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e.g., facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature.In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming.In experiments, we validate our method on one public dataset and create another new dataset that records people's daily life in public, e.g., food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

Downloads

Published

2017-02-12

How to Cite

Xu, Y., Liu, X., Qin, L., & Zhu, S.-C. (2017). Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11190