DARR: A Dual-Branch Arithmetic Regression Reasoning Framework for Solving Machine Number Reasoning

Authors

  • Chengtai Li The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Cixi Institute of Biomedical Engineering, Ningbo Institute of Industrial Technology, Chinese Academy of Sciences
  • Yee Yang Tan School of Computer Science, University of Nottingham Malaysia
  • Yuting He The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
  • Jianfeng Ren The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China
  • Ruibin Bai The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China
  • Yitian Zhao Cixi Institute of Biomedical Engineering, Ningbo Institute of Industrial Technology, Chinese Academy of Sciences
  • Heng Yu The Digital Port Technologies Lab, School of Computer Science, University of Nottingham Ningbo China
  • Xudong Jiang School of Electrical & Electronic Engineering, Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v39i2.32127

Abstract

Abstract visual reasoning (AVR) is a critical ability of humans, and it has been widely studied, but arithmetic visual reasoning, a unique task in AVR to reason over number sense, is less studied in the literature. To facilitate this research, we construct a Machine Number Reasoning (MNR) dataset to assess the model's ability in arithmetic visual reasoning over number sense and spatial layouts. To solve the MNR tasks, we propose a Dual-branch Arithmetic Regression Reasoning (DARR) framework, which includes an Intra-Image Arithmetic Regression Reasoning (IIARR) module and a Cross-Image Arithmetic Regression Reasoning (CIARR) module. The IIARR includes a set of Intra-Image Regression Blocks to identify the correct number orders and the underlying arithmetic rules within individual images, and an Order Gate to determine the correct number order. The CIARR establishes the arithmetic relations across different images through a `3-to-1' regressor and a set of `2-to-1' regressors, with a Selection Gate to select the most suitable `2-to-1' regressor and a gated fusion to combine the two kinds of regressors. Experiments on the MNR dataset show that the DARR outperforms state-of-the-art models for arithmetic visual reasoning.

Published

2025-04-11

How to Cite

Li, C., Tan, Y. Y., He, Y., Ren, J., Bai, R., Zhao, Y., … Jiang, X. (2025). DARR: A Dual-Branch Arithmetic Regression Reasoning Framework for Solving Machine Number Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1373–1382. https://doi.org/10.1609/aaai.v39i2.32127

Issue

Section

AAAI Technical Track on Cognitive Modeling & Cognitive Systems