VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification

Xing Cui; Jingzheng Wu; Wenxiang Ou; Tianyue Luo; Zhiyuan Li; Xiang Ling

doi:10.1609/aaai.v40i1.36976

Authors

Xing Cui Institute of Software, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China
Jingzheng Wu Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Wenxiang Ou Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Tianyue Luo Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Zhiyuan Li Institute of Software, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China
Xiang Ling Institute of Software, Chinese Academy of Sciences, Beijing 100190, China Quan Cheng Laboratory, Jinan 250103, Shandong, China

DOI:

https://doi.org/10.1609/aaai.v40i1.36976

Abstract

Vulnerability-Fixing Commit Identification(VFCI) is a critical task in software security maintenance that aims to automatically identify code commits that patch security vulnerabilities. However, existing approaches face challenges in handling low-quality commit messages and entangled commits, which limit their identification performance. To address these issues, we propose VFCionX, a novel VFCI framework that integrates large and small language models in a collaborative architecture. VFCionX consists of three core modules: Message Classifier, Patch Classifier, and Ensemble Classifier. The Message Classifier employs a multi-source contextual augmentation strategy to enhance the quality of commit messages and fine-tunes the Qwen2.5-1.5B model, significantly improving classification performance in the textual modality. The Patch Classifier combines heuristic rules with a Qwen2.5-Coder-7B-driven file selector to filter noise from entangled commits, and incorporates a line-level feature extractor based on CodeBERT and CNN to capture local pattern differences between added and deleted code lines. The Ensemble Classifier integrates predictions from both channels using the AdaBoost algorithm, enhancing model robustness and generalization. Experimental results on five popular C/C++ repositories comprising 24,630 commits show that VFCionX achieves an F1-score of 81.47%, outperforming the best baseline by 9.42%. Ablation studies validate the effectiveness of each component, while sensitivity analysis reveals optimal parameter settings for balancing performance and noise resilience. This work provides a new and effective solution for robust vulnerability patch identification.

VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information