CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

Jujie Wang; Kangfeng Zheng; Bin Wu; Chunhua Wu; Yulin Yao; Jiaqi Gao; Minjiao Yang

doi:10.1609/aaai.v40i2.37087

Authors

Jujie Wang Beijing University of Posts and Telecommunications
Kangfeng Zheng Beijing University of Posts and Telecommunications
Bin Wu Beijing University of Posts and Telecommunications
Chunhua Wu Beijing University of Posts and Telecommunications
Yulin Yao Beijing University of Posts and Telecommunications
Jiaqi Gao Beijing University of Posts and Telecommunications
Minjiao Yang Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i2.37087

Abstract

Software vulnerabilities have increased sharply, underscoring the growing urgency for effective detection methods. Although large language model (LLM) based methods have shown promise in this task, current state-of-the-art LLM approaches struggle with functions that have long contexts. In this paper, we propose CTX-Coder, a context-enhanced vulnerability detection framework that enables LLMs to selectively focus on relevant contextual functions. To achieve this, we represent the contextual functions as embeddings and integrate them with the target code via cross-attention, thereby enhancing the model's ability to capture contextual information. Furthermore, to equip the model with the ability to recognize these embedding features, we propose a two-stage pretraining pipeline. We also introduce a new dataset, CTX-VUL, which addresses the limitations of existing datasets that either lack contextual information for vulnerable functions or are not publicly available. Extensive experiments demonstrate that CTX-Coder (10B) significantly outperforms baseline models with even larger parameters, such as Qwen2.5-14B and SecGPT. As the input code length increases, CTX-Coder’s F1 score drops by only 5.01%, while other models degrade by 25% to 41.5%, showing strong robustness to long-context scenarios and the effectiveness of our design.

CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information