Towards Balanced Defect Prediction with Better Information Propagation

Authors

  • Xianda Zheng School of Cyber Science and Engineering, Southeast University, Nanjing, China
  • Yuan-Fang Li Faculty of Information Technology, Monash University, Melbourne, Australia
  • Huan Gao Microsoft Asia-Pacific Research and Development Group, Suzhou, China
  • Yuncheng Hua School of Computer Science and Engineering, Southeast University, Nanjing, China
  • Guilin Qi School of Cyber Science and Engineering, Southeast University, Nanjing, China School of Computer Science and Engineering, Southeast University, Nanjing, China Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China

DOI:

https://doi.org/10.1609/aaai.v35i1.16157

Keywords:

Software Engineering, Graph-based Machine Learning

Abstract

Defect prediction, the task of predicting the presence of defects in source code artifacts, has broad application in software development. Defect prediction faces two major challenges, label scarcity, where only a small percentage of code artifacts are labeled, and data imbalance, where the majority of labeled artifacts are non-defective. Moreover, current defect prediction methods ignore the impact of information propagation among code artifacts and this negligence leads to performance degradation. In this paper, we propose DPCAG, a novel model to address the above three issues. We treat code artifacts as nodes in a graph, and learn to propagate influence among neighboring nodes iteratively in an EM framework. DPCAG dynamically adjusts the contributions of each node and selects high-confidence nodes for data augmentation. Experimental results on real-world benchmark datasets show that DPCAG improves performance compare to the state-of-the-art models. In particular, DPCAG achieves substantial performance superiority when measured by Matthews Correlation Coefficient (MCC), a metric that is widely acknowledged to be the most suitable for imbalanced data.

Downloads

Published

2021-05-18

How to Cite

Zheng, X., Li, Y.-F., Gao, H., Hua, Y., & Qi, G. (2021). Towards Balanced Defect Prediction with Better Information Propagation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 759-767. https://doi.org/10.1609/aaai.v35i1.16157

Issue

Section

AAAI Technical Track on Application Domains