Query-Driven Multi-Instance Learning
We introduce a query-driven approach (qMIL) to multi-instance learning where the queries aim to uncover the class labels embodied in a given bag of instances. Specifically, it solves a multi-instance multi-label learning (MIML) problem with a more challenging setting than the conventional one. Each MIML bag in our formulation is annotated only with a binary label indicating whether the bag contains the instance of a certain class and the query is specified by the word2vec of a class label/name. To learn a deep-net model for qMIL, we construct a network component that achieves a generalized compatibility measure for query-visual co-embedding and yields proper instance attentions to the given query. The bag representation is then formed as the attention-weighted sum of the instances' weights, and passed to the classification layer at the end of the network. In addition, the qMIL formulation is flexible for extending the network to classify unseen class labels, leading to a new technique to solve the zero-shot MIML task through an iterative querying process. Experimental results on action classification over video clips and three MIML datasets from MNIST, CIFAR10 and Scene are provided to demonstrate the effectiveness of our method.