Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

Wentao He; Jialu Zhang; Jianfeng Ren; Ruibin Bai; Xudong Jiang

doi:10.1609/aaai.v37i1.25072

Authors

Wentao He The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China
Jialu Zhang The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China
Jianfeng Ren The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China
Ruibin Bai The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China
Xudong Jiang School of Electrical & Electronic Engineering, Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v37i1.25072

Keywords:

CMS: Analogical and Conceptual Reasoning, CMS: Applications, CV: Representation Learning for Vision, CV: Scene Analysis & Understanding, CV: Visual Reasoning & Symbolic Representations, ML: Relational Learning

Abstract

Raven’s Progressive Matrices (RPMs) have been widely used to evaluate the visual reasoning ability of humans. To tackle the challenges of visual perception and logic reasoning on RPMs, we propose a Hierarchical ConViT with Attention-based Relational Reasoner (HCV-ARR). Traditional solution methods often apply relatively shallow convolution networks to visually perceive shape patterns in RPM images, which may not fully model the long-range dependencies of complex pattern combinations in RPMs. The proposed ConViT consists of a convolutional block to capture the low-level attributes of visual patterns, and a transformer block to capture the high-level image semantics such as pattern formations. Furthermore, the proposed hierarchical ConViT captures visual features from multiple receptive fields, where the shallow layers focus on the image fine details while the deeper layers focus on the image semantics. To better model the underlying reasoning rules embedded in RPM images, an Attention-based Relational Reasoner (ARR) is proposed to establish the underlying relations among images. The proposed ARR well exploits the hidden relations among question images through the developed element-wise attentive reasoner. Experimental results on three RPM datasets demonstrate that the proposed HCV-ARR achieves a significant performance gain compared with the state-of-the-art models. The source code is available at: https://github.com/wentaoheunnc/HCV-ARR.

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription