Action Prediction From Videos via Memorizing Hard-to-Predict Samples

Authors

  • Yu Kong Northeastern University
  • Shangqian Gao Northeastern University
  • Bin Sun Northeastern University
  • Yun Fu Northeastern University

DOI:

https://doi.org/10.1609/aaai.v32i1.12324

Keywords:

Action prediction, Action recognition, Video analysis, Deep learning

Abstract

Action prediction based on video is an important problem in computer vision field with many applications, such as preventing accidents and criminal activities. It's challenging to predict actions at the early stage because of the large variations between early observed videos and complete ones. Besides, intra-class variations cause confusions to the predictors as well. In this paper, we propose a mem-LSTM model to predict actions in the early stage, in which a memory module is introduced to record several "hard-to-predict" samples and a variety of early observations. Our method uses Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) to model partial observed video input. We augment LSTM with a memory module to remember challenging video instances. With the memory module, our mem-LSTM model not only achieves impressive performance in the early stage but also makes predictions without the prior knowledge of observation ratio. Information in future frames is also utilized using a bi-directional layer of LSTM. Experiments on UCF-101 and Sports-1M datasets show that our method outperforms state-of-the-art methods.

Downloads

Published

2018-04-27

How to Cite

Kong, Y., Gao, S., Sun, B., & Fu, Y. (2018). Action Prediction From Videos via Memorizing Hard-to-Predict Samples. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12324