Relational Prototypical Network for Weakly Supervised Temporal Action Localization


  • Linjiang Huang CASIA
  • Yan Huang CASIA
  • Wanli Ouyang University of Sydney
  • Liang Wang CASIA



In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.




How to Cite

Huang, L., Huang, Y., Ouyang, W., & Wang, L. (2020). Relational Prototypical Network for Weakly Supervised Temporal Action Localization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 11053-11060.



AAAI Technical Track: Vision