Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing

Yuanlu Xu; Xiaobai Liu; Lei Qin; Song-Chun Zhu

doi:10.1609/aaai.v31i1.11190

Authors

Yuanlu Xu University of California, Los Angeles
Xiaobai Liu San Diego State University
Lei Qin Chinese Academy of Sciences
Song-Chun Zhu University of California, Los Angeles

DOI:

https://doi.org/10.1609/aaai.v31i1.11190

Keywords:

multi-view tracking, human behavior analysis, joint inference, cluster sampling

Abstract

In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e.g., facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature.In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming.In experiments, we validate our method on one public dataset and create another new dataset that records people's daily life in public, e.g., food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information