AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code

Authors

  • Jianqing Zhang Shanghai Jiao Tong University
  • Wei Xia Tencent
  • Hande Dong Tencent
  • Qiang Lin Tencent
  • Jian Cao Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i41.40771

Abstract

LLM's code generation capabilities have yielded substantial improvements in the effectiveness of programming tasks. However, LLM-generated code still suffers from compilation and runtime errors. Existing offline preference optimization methods primarily focus on enhancing LLMs' coding abilities using pass/fail signals in the preference data, overlooking the deep-level error types in the failed codes. To address this, we propose Adaptively Progressive Preference Optimization (AP2O) for coding (i.e., AP2O-Coder), a method that guides LLMs adaptively and methodically to reduce code errors for code generation. Specifically, we construct an error notebook from failed codes and progressively optimize the LLM to correct errors type by type. Furthermore, we adaptively replay error types to tailor to the LLM's evolving weaknesses throughout training. Through extensive experiments on both code and general LLMs (Llama, Qwen, and DeepSeek series) with parameters ranging from 0.5B to 34B, our AP2O-Coder improves code generation performance by up to 3% in pass@k while using less preference data.

Published

2026-03-14

How to Cite

Zhang, J., Xia, W., Dong, H., Lin, Q., & Cao, J. (2026). AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34701–34709. https://doi.org/10.1609/aaai.v40i41.40771

Issue

Section

AAAI Technical Track on Natural Language Processing VI