InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Yutong Wu; Di Huang; Wenxuan Shi; Wei Wang; Yewen Pu; Lingzhe Gao; Shihao Liu; Ziyuan Nan; Kaizhao Yuan; Rui Zhang; Xishan Zhang; Zidong Du; Qi Guo; Dawei Yin; Xing Hu; Yunji Chen

doi:10.1609/aaai.v39i24.34742

Authors

Yutong Wu SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
Di Huang SKL of Processors, Institute of Computing Technology, CAS
Wenxuan Shi SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
Wei Wang Baidu Inc., Beijing, China
Yewen Pu Autodesk Research
Lingzhe Gao Baidu Inc., Beijing, China
Shihao Liu Baidu Inc., Beijing, China
Ziyuan Nan SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
Kaizhao Yuan SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences
Rui Zhang SKL of Processors, Institute of Computing Technology, CAS
Xishan Zhang SKL of Processors, Institute of Computing Technology, CAS
Zidong Du SKL of Processors, Institute of Computing Technology, CAS
Qi Guo SKL of Processors, Institute of Computing Technology, CAS
Dawei Yin Baidu Inc., Beijing, China
Xing Hu SKL of Processors, Institute of Computing Technology, CAS
Yunji Chen SKL of Processors, Institute of Computing Technology, CAS University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i24.34742

Abstract

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset. We make two observations: (1) A code snippet can serve as the response to different instructions. (2) Instruction-tuned code LLMs perform better at translating code into instructions than the reverse. Based on these observations, we propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset. The additional instruction-response pairs are added to the original dataset, and a stronger code LLM can be obtained by fine-tuning on the augmented dataset. We empirically validate Inverse-Instruct on a range of open-source code models (e.g. CodeLlama-Python and DeepSeek-Coder) and benchmarks (e.g., HumanEval(+), MBPP(+), DS-1000 and MultiPL-E), showing it consistently improves the base models.

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information