Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly, and the Way Forward

Authors

  • Peter Flach University of Bristol

DOI:

https://doi.org/10.1609/aaai.v33i01.33019808

Abstract

This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference.

Downloads

Published

2019-07-17

How to Cite

Flach, P. (2019). Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly, and the Way Forward. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9808-9814. https://doi.org/10.1609/aaai.v33i01.33019808

Issue

Section

Senior Member Presentation Track: Summary Talks