VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification

Authors

  • Xing Cui Institute of Software, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China
  • Jingzheng Wu Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Wenxiang Ou Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Tianyue Luo Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Zhiyuan Li Institute of Software, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China
  • Xiang Ling Institute of Software, Chinese Academy of Sciences, Beijing 100190, China Quan Cheng Laboratory, Jinan 250103, Shandong, China

DOI:

https://doi.org/10.1609/aaai.v40i1.36976

Abstract

Vulnerability-Fixing Commit Identification(VFCI) is a critical task in software security maintenance that aims to automatically identify code commits that patch security vulnerabilities. However, existing approaches face challenges in handling low-quality commit messages and entangled commits, which limit their identification performance. To address these issues, we propose VFCionX, a novel VFCI framework that integrates large and small language models in a collaborative architecture. VFCionX consists of three core modules: Message Classifier, Patch Classifier, and Ensemble Classifier. The Message Classifier employs a multi-source contextual augmentation strategy to enhance the quality of commit messages and fine-tunes the Qwen2.5-1.5B model, significantly improving classification performance in the textual modality. The Patch Classifier combines heuristic rules with a Qwen2.5-Coder-7B-driven file selector to filter noise from entangled commits, and incorporates a line-level feature extractor based on CodeBERT and CNN to capture local pattern differences between added and deleted code lines. The Ensemble Classifier integrates predictions from both channels using the AdaBoost algorithm, enhancing model robustness and generalization. Experimental results on five popular C/C++ repositories comprising 24,630 commits show that VFCionX achieves an F1-score of 81.47%, outperforming the best baseline by 9.42%. Ablation studies validate the effectiveness of each component, while sensitivity analysis reveals optimal parameter settings for balancing performance and noise resilience. This work provides a new and effective solution for robust vulnerability patch identification.

Downloads

Published

2026-03-14

How to Cite

Cui, X., Wu, J., Ou, W., Luo, T., Li, Z., & Ling, X. (2026). VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 166–174. https://doi.org/10.1609/aaai.v40i1.36976

Issue

Section

AAAI Technical Track on Application Domains I