Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos


  • Zhanning Gao Alibaba Group
  • Le Wang Xi'an Jiaotong University
  • Qilin Zhang HERE Technologies
  • Zhenxing Niu Alibaba Group
  • Nanning Zheng Xi'an Jiaotong University
  • Gang Hua Microsoft Cloud and AI



We propose a temporal action detection by spatial segmentation framework, which simultaneously categorize actions and temporally localize action instances in untrimmed videos. The core idea is the conversion of temporal detection task into a spatial semantic segmentation task. Firstly, the video imprint representation is employed to capture the spatial/temporal interdependences within/among frames and represent them as spatial proximity in a feature space. Subsequently, the obtained imprint representation is spatially segmented by a fully convolutional network. With such segmentation labels projected back to the video space, both temporal action boundary localization and per-frame spatial annotation can be obtained simultaneously. The proposed framework is robust to variable lengths of untrimmed videos, due to the underlying fixed-size imprint representations. The efficacy of the framework is validated in two public action detection datasets.




How to Cite

Gao, Z., Wang, L., Zhang, Q., Niu, Z., Zheng, N., & Hua, G. (2019). Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8328-8335.



AAAI Technical Track: Vision