A Latent Variable Model for Discovering Bird Species Commonly Misidentified by Citizen Scientists


  • Jun Yu Oregon State University
  • Rebecca Hutchinson Oregon State University
  • Weng-Keen Wong Oregon State University




Machine Learning, Probabilistic Graphical Model, Citizen Science, Crowdsourcing


Data quality is a common source of concern for large-scale citizen science projects like eBird. In the case of eBird, a major cause of poor quality data is the misidentification of bird species by inexperienced contributors. A proactive approach for improving data quality is to discover commonly misidentified bird species and to teach inexperienced birders the differences between these species. To accomplish this goal, we develop a latent variable graphical model that can identify groups of bird species that are often confused for each other by eBird participants. Our model is a multi-species extension of the classic occupancy-detection model in the ecology literature. This multi-species extension requires a structure learning step as well as a computationally expensive parameter learning stage which we make efficient through a variational approximation. We show that our model can not only discover groups of misidentified species, but by including these misidentifications in the model, it can also achieve more accurate predictions of both species occupancy and detection.




How to Cite

Yu, J., Hutchinson, R., & Wong, W.-K. (2014). A Latent Variable Model for Discovering Bird Species Commonly Misidentified by Citizen Scientists. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8763



Computational Sustainability and Artificial Intelligence