LC3: Long Cross-Language Code Clone Detection Enhanced by Opcode Sequences and Affinity Aggregation

Authors

  • Xilin Lan School of Computer Science and Engineering, Central South University
  • Huan Zhang School of Computer Science and Engineering, Central South University
  • Yang Yang School of Computer Science and Engineering, Central South University
  • Chengwu Xue School of Computer Science and Engineering, Central South University
  • Li Kuang School of Computer Science and Engineering, Central South University

DOI:

https://doi.org/10.1609/aaai.v40i37.40410

Abstract

Cross-language code clone detection, which identifies functionally similar code across programming languages, is critical for ensuring synchronized evolution and reducing maintenance costs in multi-platform software development. While zero-shot approaches have emerged as a practical solution to data scarcity, state-of-the-art methods still face two major limitations: an insufficiency in learning language-agnostic representations and information loss during the processing of long code. To address these challenges, we propose LC3, a novel framework for robust zero-shot cross-language code clone detection. To overcome the language-agnostic representation insufficiency, LC3 fuses source code with its underlying opcode sequences, leveraging a bimodal architecture and adversarial training to learn a language-agnostic representation. To resolve long-code information loss, LC3 introduces a semantic affinity aggregation strategy. This strategy synthesizes a robust clone score from a complete pairwise similarity matrix computed between segmented code blocks, overcoming the limitations of both simple truncation and aggregation. Extensive experiments show that LC3 significantly outperforms state-of-the-art zero-shot baselines, especially in challenging long-code scenarios.

Downloads

Published

2026-03-14

How to Cite

Lan, X., Zhang, H., Yang, Y., Xue, C., & Kuang, L. (2026). LC3: Long Cross-Language Code Clone Detection Enhanced by Opcode Sequences and Affinity Aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 31456–31464. https://doi.org/10.1609/aaai.v40i37.40410

Issue

Section

AAAI Technical Track on Natural Language Processing II