CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework
DOI:
https://doi.org/10.1609/aaai.v38i8.28713Keywords:
DMKM: Applications, NLP: OtherAbstract
Multilingual code retrieval aims to find code snippets relevant to a user's query from a multilingual codebase, which plays a crucial role in software development and expands their application scenarios compared to classical monolingual code retrieval. Despite the performance improvements achieved by previous studies, two crucial problems are overlooked in the multilingual scenario. First, certain programming languages face data scarcity in specific domains, resulting in limited representation capabilities within those domains. Second, different programming languages can be used interchangeably within the same domain, making it challenging for multilingual models to accurately identify the intended programming language of a user's query. To address these issues, we propose the CommONalities and SpecIalties Driven Multilingual CodE Retrieval Framework (CONSIDER), which includes two modules. The first module enhances the representation of various programming languages by modeling pairwise and global commonalities among them. The second module introduces a novel contrastive learning negative sampling algorithm that leverages language confusion to automatically extract specific language features. Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well. Our source code is available at https://github.com/smsquirrel/consider.Downloads
Published
2024-03-24
How to Cite
Li, R., He, L., Liu, Q., Zhao, Y., Zhang, Z., Huang, Z., Su, Y., & Wang, S. (2024). CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8679-8687. https://doi.org/10.1609/aaai.v38i8.28713
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management