Representing Sets of Instances for Visual Recognition

Authors

  • Jianxin Wu Nanjing University
  • Bin-Bin Gao Nanjing University
  • Guoqing Liu Minieye, Youjia Innovation LLC

DOI:

https://doi.org/10.1609/aaai.v30i1.10184

Abstract

In computer vision, a complex entity such as an image or video is often represented as a set of instance vectors, which are extracted from different parts of that entity. Thus, it is essential to design a representation to encode information in a set of instances robustly. Existing methods such as FV and VLAD are designed based on a generative perspective, and their performances fluctuate when difference types of instance vectors are used (i.e., they are not robust). The proposed D3 method effectively compares two sets as two distributions, and proposes a directional total variation distance (DTVD) to measure their dissimilarity. Furthermore, a robust classifier-based method is proposed to estimate DTVD robustly, and to efficiently represent these sets. D3 is evaluated in action and image recognition tasks. It achieves excellent robustness, accuracy and speed.

Downloads

Published

2016-03-02

How to Cite

Wu, J., Gao, B.-B., & Liu, G. (2016). Representing Sets of Instances for Visual Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10184

Issue

Section

Technical Papers: Machine Learning Methods