Combating Sampling Bias: A Self-Training Method in Credit Risk Models


  • Jingxian Liao Intuit AI+Data, Intuit, Inc. Department of Computer Science, University of California Davis
  • Wei Wang Intuit AI+Data, Intuit, Inc.
  • Jason Xue Intuit AI+Data, Intuit, Inc.
  • Anthony Lei QuickBooks Capital, Intuit, Inc.
  • Xue Han Intuit AI+Data, Intuit, Inc.
  • Kun Lu Intuit AI+Data, Intuit, Inc.



Credit Risk, Self-training, Reject Inference


A significant challenge in credit risk models for underwriting is the presence of bias in model training data. When most credit risk models are built using only applicants who had been funded for credit, such non-random sampling predominantly influenced by credit policymakers and previous loan performances may introduce sampling bias to the models, and thus alter their prediction of default on loan repayment when screening applications from prospective borrowers. In this paper, we propose a novel data augmentation method that aims to identify and pseudo-label parts of the historically declined loan applications to mitigate sampling bias in the training data. We also introduce a new measure to assess the performance from the business perspective, loan application approval rates at various loan default rate levels. Our proposed methods were compared to the original supervised learning model and the traditional sampling issue remedy techniques in the industry. The experiment and early production results from deployed model show that self-training method with calibrated probability as data augmentation selection criteria improved the ability of credit scoring to differentiate default loan applications and, more importantly, can increase loan approval rate up to 8.8\%, while keeping similar default rate comparing to baselines. The results demonstrate practical implications on how future underwriting model development processes should follow.




How to Cite

Liao, J., Wang, W., Xue, J., Lei, A., Han, X., & Lu, K. (2022). Combating Sampling Bias: A Self-Training Method in Credit Risk Models. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12566-12572.