DeepOR: A Deep Reasoning Foundation Model for Optimization Modeling

Ziyang Xiao; Yuan Jessica Wang; Xiongwei Han; Shisi Guan; Jingyan Zhu; Jingrong Xie; Lilin Xu; Han Wu; Wing Yin Yu; Zehua Liu; Xiaojin Fu; Gang Chen; Dongxiang Zhang

doi:10.1609/aaai.v40i40.40699

Authors

Ziyang Xiao Zhejiang University
Yuan Jessica Wang School of Business, Singapore University of Social Sciences
Xiongwei Han Huawei Noah’s Ark Lab
Shisi Guan Zhejiang University
Jingyan Zhu Zhejiang University
Jingrong Xie Zhejiang University
Lilin Xu Zhejiang University
Han Wu Huawei Noah’s Ark Lab
Wing Yin Yu Huawei Noah’s Ark Lab
Zehua Liu Huawei Noah’s Ark Lab
Xiaojin Fu Huawei Noah’s Ark Lab
Gang Chen Zhejiang University
Dongxiang Zhang Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i40.40699

Abstract

Optimization modeling plays a critical role in supporting optimal decision-making across various domains. Previous works have demonstrated that large language models (LLMs) tailored for optimization modeling have significantly automated and simplified this process. However, these models typically employ a straightforward input-output paradigm and struggle with challenging instances. In contrast, recent advances in general-purpose reasoning LLMs (RLLMs), such as DeepSeek-R1, have shown impressive capabilities in complex domains like mathematics and coding. In this paper, we introduce DeepOR, the first RLLM specifically designed for optimization modeling. Instead of directly outputting solutions, DeepOR explicitly performs multiple intermediate reasoning steps. To adapt a base LLM into an RLLM, we begin by synthesizing long chain-of-thought (CoT) data guided by a flowchart, which is automatically generated using a self-exploration algorithm. Once the training data are prepared, we employ supervised fine-tuning on the base LLM to endow it with reasoning capabilities tailored for optimization modeling. To fully leverage the model's reasoning potential, we further apply reinforcement learning with reward-shaping derived from solver feedback. Experimental results on benchmarks confirm that DeepOR consistently and significantly outperforms existing state-of-the-art approaches.

DeepOR: A Deep Reasoning Foundation Model for Optimization Modeling

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information