VERSE: Verification-based Self-Play for Code Instructions

Hao Jiang; Qi Liu; Rui Li; Yuze Zhao; Yixiao Ma; Shengyu Ye; Junyu Lu; Yu Su

doi:10.1609/aaai.v39i23.34604

Authors

Hao Jiang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Qi Liu State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Rui Li State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Yuze Zhao State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Yixiao Ma State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Shengyu Ye State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Junyu Lu State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Yu Su Institute of Artificial Intelligence, Hefei Comprehensive National Science Center School of Computer Science and Artificial Intelligence, Hefei Normal University

DOI:

https://doi.org/10.1609/aaai.v39i23.34604

Abstract

Instruction-tuned Code Large Language Models (Code LLMs) have excelled in diverse code-related tasks, such as program synthesis, automatic program repair, and code explanation. To collect training datasets for instruction-tuning, a popular method involves having models autonomously generate instructions and corresponding responses. However, the direct generation of responses does not ensure functional correctness, a crucial requirement for generating responses to code instructions. To overcome this, we present Verification-Based Self-Play (VERSE), aiming to enhance model proficiency in generating correct responses. VERSE establishes a robust verification framework that covers various code instructions. Employing VERSE, Code LLMs engage in self-play to generate instructions and corresponding verifications. They evaluate execution results and self-consistency as verification outcomes, using them as scores to rank generated data for self-training. Experiments show that VERSE improves multiple base Code LLMs (average 7.6%) across various languages and tasks on many benchmarks, affirming its effectiveness.

VERSE: Verification-based Self-Play for Code Instructions

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information