Convolutional Neural Networks over Tree Structures for Programming Language Processing

Authors

  • Lili Mou Peking University
  • Ge Li Peking University
  • Lu Zhang Peking University
  • Tao Wang Stanford Univeristy
  • Zhi Jin Peking Univeristy

DOI:

https://doi.org/10.1609/aaai.v30i1.10139

Keywords:

deep learning, neural network, program analysis

Abstract

Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.

Downloads

Published

2016-02-21

How to Cite

Mou, L., Li, G., Zhang, L., Wang, T., & Jin, Z. (2016). Convolutional Neural Networks over Tree Structures for Programming Language Processing. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10139

Issue

Section

Technical Papers: Machine Learning Applications