SMART Frame Selection for Action Recognition

Shreyank N Gowda; Marcus Rohrbach; Laura Sevilla-Lara

doi:10.1609/aaai.v35i2.16235

Authors

Shreyank N Gowda University of Edinburgh
Marcus Rohrbach Facebook AI Research
Laura Sevilla-Lara University of Edinburgh

DOI:

https://doi.org/10.1609/aaai.v35i2.16235

Keywords:

Video Understanding & Activity Analysis

Abstract

Video classification is computationally expensive. In this paper, we address theproblem of frame selection to reduce the computational cost of video classification.Recent work has successfully leveraged frame selection for long, untrimmed videos,where much of the content is not relevant, and easy to discard. In this work, however,we focus on the more standard short, trimmed video classification problem. Weargue that good frame selection can not only reduce the computational cost of videoclassification but also increase the accuracy by getting rid of frames that are hard toclassify. In contrast to previous work, we propose a method that instead of selectingframes by considering one at a time, considers them jointly. This results in a moreefficient selection, where “good" frames are more effectively distributed over thevideo, like snapshots that tell a story. We call the proposed frame selection SMARTand we test it in combination with different backbone architectures and on multiplebenchmarks (Kinetics [5], Something-something [14], UCF101 [31]). We showthat the SMART frame selection consistently improves the accuracy compared toother frame selection strategies while reducing the computational cost by a factorof 4 to 10 times. Additionally, we show that when the primary goal is recognitionperformance, our selection strategy can improve over recent state-of-the-art modelsand frame selection strategies on various benchmarks (UCF101, HMDB51 [21],FCVID [17], and ActivityNet [4]).

SMART Frame Selection for Action Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription