AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection

Authors

  • Yangkai Du Zhejiang University
  • Tengfei Ma Stony Brook University
  • Lingfei Wu Anytime.AI
  • Xuhong Zhang Zhejiang University
  • Shouling Ji Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i16.29749

Keywords:

NLP: Applications, NLP: Sentence-level Semantics, Textual Inference, etc., NLP: Other

Abstract

Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and achieve comparable performance to supervised fine-tuning.

Downloads

Published

2024-03-24

How to Cite

Du, Y., Ma, T., Wu, L., Zhang, X., & Ji, S. (2024). AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17942-17950. https://doi.org/10.1609/aaai.v38i16.29749

Issue

Section

AAAI Technical Track on Natural Language Processing I