Enhancing Fairness in LLM Evaluations: Unveiling and Mitigating Biases in Standard-Answer-Based Evaluations
DOI:
https://doi.org/10.1609/aaaiss.v4i1.31771Abstract
Large Language Models (LLMs) are recognized for their effectiveness in comparing two answers. However, LLMs can still exhibit biases when comparing one answer to a standard answer, particularly in real-world scenarios like new employee orientations. This paper identifies positional and verbosity biases in LLM evaluators in such contexts. To mitigate these biases, we apply Chain of Thought prompting and Multi-Agent Debate strategies. Our research reveals that bias prevalence varies among different models, indicating the need for tailored approaches to ensure unbiased and constructive feedback.Downloads
Published
2024-11-08
How to Cite
Jiao, T., Zhang, J., Xu, K., Li, R., Du, X., Wang, S., & Song, Z. (2024). Enhancing Fairness in LLM Evaluations: Unveiling and Mitigating Biases in Standard-Answer-Based Evaluations. Proceedings of the AAAI Symposium Series, 4(1), 56-59. https://doi.org/10.1609/aaaiss.v4i1.31771
Issue
Section
AI Trustworthiness and Risk Assessment for Challenging Contexts (ATRACC)