SMART Frame Selection for Action Recognition


  • Shreyank N Gowda University of Edinburgh
  • Marcus Rohrbach Facebook AI Research
  • Laura Sevilla-Lara University of Edinburgh



Video Understanding & Activity Analysis


Video classification is computationally expensive. In this paper, we address theproblem of frame selection to reduce the computational cost of video classification.Recent work has successfully leveraged frame selection for long, untrimmed videos,where much of the content is not relevant, and easy to discard. In this work, however,we focus on the more standard short, trimmed video classification problem. Weargue that good frame selection can not only reduce the computational cost of videoclassification but also increase the accuracy by getting rid of frames that are hard toclassify. In contrast to previous work, we propose a method that instead of selectingframes by considering one at a time, considers them jointly. This results in a moreefficient selection, where “good" frames are more effectively distributed over thevideo, like snapshots that tell a story. We call the proposed frame selection SMARTand we test it in combination with different backbone architectures and on multiplebenchmarks (Kinetics [5], Something-something [14], UCF101 [31]). We showthat the SMART frame selection consistently improves the accuracy compared toother frame selection strategies while reducing the computational cost by a factorof 4 to 10 times. Additionally, we show that when the primary goal is recognitionperformance, our selection strategy can improve over recent state-of-the-art modelsand frame selection strategies on various benchmarks (UCF101, HMDB51 [21],FCVID [17], and ActivityNet [4]).




How to Cite

Gowda, S. N., Rohrbach, M., & Sevilla-Lara, L. (2021). SMART Frame Selection for Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1451-1459.



AAAI Technical Track on Computer Vision I