Large Language Models Struggle with Unreasonability in Math Problems

Jingyuan Ma; Damai Dai; Zihang Yuan; Rui Li; Weilin Luo; Bin Wang; Qun Liu; Lei Sha; Zhifang Sui

doi:10.1609/aaai.v40i38.40518

Authors

Jingyuan Ma State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Damai Dai State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Zihang Yuan Institute of Artificial Intelligence, Beihang University
Rui Li State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Weilin Luo Huawei Noah's Ark Lab, China
Bin Wang Huawei Noah's Ark Lab, China
Qun Liu Huawei Noah's Ark Lab, China
Lei Sha Institute of Artificial Intelligence, Beihang University
Zhifang Sui State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University

DOI:

https://doi.org/10.1609/aaai.v40i38.40518

Abstract

Large Language Models (LLMs) have shown remarkable success on a wide range of math and reasoning benchmarks. However, we observe that they often struggle when faced with unreasonable math problems. Instead of recognizing these issues, models frequently proceed as if the problem is well-posed, producing incorrect answers or falling into overthinking and verbose self-correction. To systematically investigate this overlooked vulnerability, we propose the Unreasonable Math Problems (UMP) benchmark, designed to evaluate LLMs' ability to detect and respond to unreasonable math problem statements. Based on extensive experiments covering 19 LLMs, we find that even state-of-the-art general models like GPT-4o struggle on UMP. While reasoning models such as DeepSeek-R1 demonstrate a higher sensitivity to unreasonable inputs, this often comes at the cost of generating overly long and meaningless responses that fail to converge. We further find that prompting and fine-tuning enhance the detection of unreasonable inputs, with minor and acceptable trade-offs, making them practical solutions in this challenging setting.

Large Language Models Struggle with Unreasonability in Math Problems

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information