CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

Authors

  • Jujie Wang Beijing University of Posts and Telecommunications
  • Kangfeng Zheng Beijing University of Posts and Telecommunications
  • Bin Wu Beijing University of Posts and Telecommunications
  • Chunhua Wu Beijing University of Posts and Telecommunications
  • Yulin Yao Beijing University of Posts and Telecommunications
  • Jiaqi Gao Beijing University of Posts and Telecommunications
  • Minjiao Yang Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i2.37087

Abstract

Software vulnerabilities have increased sharply, underscoring the growing urgency for effective detection methods. Although large language model (LLM) based methods have shown promise in this task, current state-of-the-art LLM approaches struggle with functions that have long contexts. In this paper, we propose CTX-Coder, a context-enhanced vulnerability detection framework that enables LLMs to selectively focus on relevant contextual functions. To achieve this, we represent the contextual functions as embeddings and integrate them with the target code via cross-attention, thereby enhancing the model's ability to capture contextual information. Furthermore, to equip the model with the ability to recognize these embedding features, we propose a two-stage pretraining pipeline. We also introduce a new dataset, CTX-VUL, which addresses the limitations of existing datasets that either lack contextual information for vulnerable functions or are not publicly available. Extensive experiments demonstrate that CTX-Coder (10B) significantly outperforms baseline models with even larger parameters, such as Qwen2.5-14B and SecGPT. As the input code length increases, CTX-Coder’s F1 score drops by only 5.01%, while other models degrade by 25% to 41.5%, showing strong robustness to long-context scenarios and the effectiveness of our design.

Downloads

Published

2026-03-14

How to Cite

Wang, J., Zheng, K., Wu, B., Wu, C., Yao, Y., Gao, J., & Yang, M. (2026). CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1159–1167. https://doi.org/10.1609/aaai.v40i2.37087

Issue

Section

AAAI Technical Track on Application Domains II