Towards Balanced Defect Prediction with Better Information Propagation

Xianda Zheng; Yuan-Fang Li; Huan Gao; Yuncheng Hua; Guilin Qi

doi:10.1609/aaai.v35i1.16157

Authors

Xianda Zheng School of Cyber Science and Engineering, Southeast University, Nanjing, China
Yuan-Fang Li Faculty of Information Technology, Monash University, Melbourne, Australia
Huan Gao Microsoft Asia-Pacific Research and Development Group, Suzhou, China
Yuncheng Hua School of Computer Science and Engineering, Southeast University, Nanjing, China
Guilin Qi School of Cyber Science and Engineering, Southeast University, Nanjing, China School of Computer Science and Engineering, Southeast University, Nanjing, China Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China

DOI:

https://doi.org/10.1609/aaai.v35i1.16157

Keywords:

Software Engineering, Graph-based Machine Learning

Abstract

Defect prediction, the task of predicting the presence of defects in source code artifacts, has broad application in software development. Defect prediction faces two major challenges, label scarcity, where only a small percentage of code artifacts are labeled, and data imbalance, where the majority of labeled artifacts are non-defective. Moreover, current defect prediction methods ignore the impact of information propagation among code artifacts and this negligence leads to performance degradation. In this paper, we propose DPCAG, a novel model to address the above three issues. We treat code artifacts as nodes in a graph, and learn to propagate influence among neighboring nodes iteratively in an EM framework. DPCAG dynamically adjusts the contributions of each node and selects high-confidence nodes for data augmentation. Experimental results on real-world benchmark datasets show that DPCAG improves performance compare to the state-of-the-art models. In particular, DPCAG achieves substantial performance superiority when measured by Matthews Correlation Coefficient (MCC), a metric that is widely acknowledged to be the most suitable for imbalanced data.

Towards Balanced Defect Prediction with Better Information Propagation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription