DeepOR: A Deep Reasoning Foundation Model for Optimization Modeling

Authors

  • Ziyang Xiao Zhejiang University
  • Yuan Jessica Wang School of Business, Singapore University of Social Sciences
  • Xiongwei Han Huawei Noah’s Ark Lab
  • Shisi Guan Zhejiang University
  • Jingyan Zhu Zhejiang University
  • Jingrong Xie Zhejiang University
  • Lilin Xu Zhejiang University
  • Han Wu Huawei Noah’s Ark Lab
  • Wing Yin Yu Huawei Noah’s Ark Lab
  • Zehua Liu Huawei Noah’s Ark Lab
  • Xiaojin Fu Huawei Noah’s Ark Lab
  • Gang Chen Zhejiang University
  • Dongxiang Zhang Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i40.40699

Abstract

Optimization modeling plays a critical role in supporting optimal decision-making across various domains. Previous works have demonstrated that large language models (LLMs) tailored for optimization modeling have significantly automated and simplified this process. However, these models typically employ a straightforward input-output paradigm and struggle with challenging instances. In contrast, recent advances in general-purpose reasoning LLMs (RLLMs), such as DeepSeek-R1, have shown impressive capabilities in complex domains like mathematics and coding. In this paper, we introduce DeepOR, the first RLLM specifically designed for optimization modeling. Instead of directly outputting solutions, DeepOR explicitly performs multiple intermediate reasoning steps. To adapt a base LLM into an RLLM, we begin by synthesizing long chain-of-thought (CoT) data guided by a flowchart, which is automatically generated using a self-exploration algorithm. Once the training data are prepared, we employ supervised fine-tuning on the base LLM to endow it with reasoning capabilities tailored for optimization modeling. To fully leverage the model's reasoning potential, we further apply reinforcement learning with reward-shaping derived from solver feedback. Experimental results on benchmarks confirm that DeepOR consistently and significantly outperforms existing state-of-the-art approaches.

Published

2026-03-14

How to Cite

Xiao, Z., Wang, Y. J., Han, X., Guan, S., Zhu, J., Xie, J., … Zhang, D. (2026). DeepOR: A Deep Reasoning Foundation Model for Optimization Modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34052–34060. https://doi.org/10.1609/aaai.v40i40.40699

Issue

Section

AAAI Technical Track on Natural Language Processing V