Cavalin, P., Domingues, P. H., & Pinhanez, C. (2025). Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23532–23540. https://doi.org/10.1609/aaai.v39i22.34522