An Example of (Too Much) Hyper-Parameter Tuning In Suicide Ideation Detection

Authors

  • Annika Marie Schoene Northeastern University Institute for Experiential AI
  • John Ortega Northeastern University Institute for Experiential AI
  • Silvio Amir Northeastern University Institute for Experiential AI Khoury College of Computer Sciences
  • Kenneth Church Northeastern University Institute for Experiential AI Khoury College of Computer Sciences

DOI:

https://doi.org/10.1609/icwsm.v17i1.22227

Keywords:

Abstract

This work starts with the TWISCO baseline, a benchmark of suicide-related content from Twitter. We find that hyper-parameter tuning can improve this baseline by 9%. We examined 576 combinations of hyper-parameters: learning rate, batch size, epochs and date range of training data. Reasonable settings of learning rate and batch size produce better results than poor settings. Date range is less conclusive. Balancing the date range of the training data to match the benchmark ought to improve performance, but the differences are relatively small. Optimal settings of learning rate and batch size are much better than poor settings, but optimal settings of date range are not that different from poor settings of date range. Finally, we end with concerns about reproducibility. Of the 576 experiments, 10% produced F1 performance above baseline. It is common practice in the literature to run many experiments and report the best, but doing so may be risky, especially given the sensitive nature of Suicide Ideation Detection.

Downloads

Published

2023-06-02

How to Cite

Marie Schoene, A., Ortega, J., Amir, S., & Church, K. (2023). An Example of (Too Much) Hyper-Parameter Tuning In Suicide Ideation Detection. Proceedings of the International AAAI Conference on Web and Social Media, 17(1), 1158-1162. https://doi.org/10.1609/icwsm.v17i1.22227