InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Authors

  • Yutong Wu SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
  • Di Huang SKL of Processors, Institute of Computing Technology, CAS
  • Wenxuan Shi SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
  • Wei Wang Baidu Inc., Beijing, China
  • Yewen Pu Autodesk Research
  • Lingzhe Gao Baidu Inc., Beijing, China
  • Shihao Liu Baidu Inc., Beijing, China
  • Ziyuan Nan SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
  • Kaizhao Yuan SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
  • Rui Zhang SKL of Processors, Institute of Computing Technology, CAS
  • Xishan Zhang SKL of Processors, Institute of Computing Technology, CAS
  • Zidong Du SKL of Processors, Institute of Computing Technology, CAS
  • Qi Guo SKL of Processors, Institute of Computing Technology, CAS
  • Dawei Yin Baidu Inc., Beijing, China
  • Xing Hu SKL of Processors, Institute of Computing Technology, CAS
  • Yunji Chen SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i24.34742

Abstract

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset. We make two observations: (1) A code snippet can serve as the response to different instructions. (2) Instruction-tuned code LLMs perform better at translating code into instructions than the reverse. Based on these observations, we propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset. The additional instruction-response pairs are added to the original dataset, and a stronger code LLM can be obtained by fine-tuning on the augmented dataset. We empirically validate Inverse-Instruct on a range of open-source code models (e.g. CodeLlama-Python and DeepSeek-Coder) and benchmarks (e.g., HumanEval(+), MBPP(+), DS-1000 and MultiPL-E), showing it consistently improves the base models.

Downloads

Published

2025-04-11

How to Cite

Wu, Y., Huang, D., Shi, W., Wang, W., Pu, Y., Gao, L., … Chen, Y. (2025). InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25525–25533. https://doi.org/10.1609/aaai.v39i24.34742

Issue

Section

AAAI Technical Track on Natural Language Processing III