LI, Xiang; LAN, Yunshi; YANG, Chao. TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 39, n. 23, p. 24485–24493, 2025. DOI: 10.1609/aaai.v39i23.34627. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/34627. Acesso em: 13 may. 2026.