TY - JOUR AU - Wu, Ruihai AU - Xu, Kehan AU - Liu, Chenchen AU - Zhuang, Nan AU - Mu, Yadong PY - 2020/04/03 Y2 - 2024/03/29 TI - Localize, Assemble, and Predicate: Contextual Object Proposal Embedding for Visual Relation Detection JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 07 SE - AAAI Technical Track: Vision DO - 10.1609/aaai.v34i07.6913 UR - https://ojs.aaai.org/index.php/AAAI/article/view/6913 SP - 12297-12304 AB - <p>Visual relation detection (VRD) aims to describe all interacting objects in an image using subject-predicate-object triplets. Critically, valid relations combinatorially grow in <em>O</em>(<em>C</em><sup>2</sup> <em>R</em>) for <em>C</em> object categories and <em>R</em> relationships. The frequencies of relation triplets exhibit a long-tailed distribution, which inevitably leads to bias towards popular visual relations in the learned VRD model. To address this problem, we propose localize-assemble-predicate network (LAP-Net), which decomposes VRD into three sub-tasks: localizing individual objects, assembling and predicting the subject-object pairs. In the first stage of LAP-Net, Region Proposal Network (RPN) is used to generate a few class-agnostic object proposals. Next, these proposals are assembled to form subject-object pairs via a second Pair Proposal Network (PPN), in which we propose a novel contextual embedding scheme. The inner product between embedded representations faithfully reflects the compatibility between a pair of proposals, without estimating object and subject class. Top-ranked pairs from stage two are fed into a third sub-network, which precisely estimates the relationship. The whole pipeline except for the last stage is object-category-agnostic in localizing relationships in an image, alleviating the bias in popular relations induced by training data. Our LAP-Net can be trained in an end-to-end fashion. We demonstrate that LAP-Net achieves state-of-the-art performance on the VRD benchmark while maintaining high speed in inference.</p> ER -