Visual Question Answer Diversity

Chun-Ju Yang; Kristen Grauman; Danna Gurari

doi:10.1609/hcomp.v6i1.13341

Authors

Chun-Ju Yang University of Texas at Austin
Kristen Grauman University of Texas at Austin
Danna Gurari University of Texas at Austin

DOI:

https://doi.org/10.1609/hcomp.v6i1.13341

Abstract

Visual questions (VQs) can lead multiple people to respond with different answers rather than a single, agreed upon response. Moreover, the answers from a crowd can include different numbers of unique answers that arise with different relative frequencies. Such answer diversity arises for a variety of reasons including that VQs are subjective, difficult, or ambiguous. We propose a new problem of predicting the answer distribution that would be observed from a crowd for any given VQ; i.e., the number of unique answers and their relative frequencies. Our experiments confirm that the answer distribution can be predicted accurately for VQs asked by both blind and sighted people. We then propose a novel crowd-powered VQA system that uses the answer distribution predictions to reason about how many answers are needed to capture the diversity of possible human responses. Experiments demonstrate this proposed system accelerates capturing the diversity of answers with considerably less human effort than is required with a state-of-art system.

Visual Question Answer Diversity

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information