Explore What LLM Does Not Know in Complex Question Answering

Xin Lin; Zhenya Huang; Zhiqiang Zhang; Jun Zhou; Enhong Chen

doi:10.1609/aaai.v39i23.34638

Authors

Xin Lin School of Computer Science and Technology, University of Science and Technology of China, Hefei, China State Key Laboratory of Cognitive Intelligence, Hefei, China
Zhenya Huang School of Computer Science and Technology, University of Science and Technology of China, Hefei, China State Key Laboratory of Cognitive Intelligence, Hefei, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Zhiqiang Zhang Independent Researcher
Jun Zhou Zhejiang University, Hangzhou, China
Enhong Chen School of Computer Science and Technology, University of Science and Technology of China, Hefei, China State Key Laboratory of Cognitive Intelligence, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v39i23.34638

Abstract

Complex question answering (QA) is a challenging task in artificial intelligence research which requires reasoning based on related knowledge. The retrieval-augmented generation (RAG) based on large language models (LLMs) have become one promising solution in QA. To facilitate RAG more effectively, the LLM needs to precisely evaluate knowledge required in QA. That is, first, the LLM needs to examine its knowledge boundary (what the LLM does not know) to retrieve external knowledge as supplement. Second, the LLM needs to evaluate the utility of the retrieved knowledge (whether it helps in reasoning) for robust RAG. To this end, in this paper, we propose a novel Question Answering with Knowledge Evaluation (KEQA) framework to promote the effectiveness and efficiency of RAG in QA. First, inspired by quizzes in classroom, we propose a quiz-based method to precisely examine the knowledge state of the uninterpretable LLM for QA. We ask indicative quizzes on each required knowledge, and inspect whether the LLM can consistently answer the quiz to examine its knowledge boundary. Second, we retrieve the unknown knowledge from external source, and evaluate its utility to pick the helpful ones for reasoning. We design a reasoning-based metric to evaluate utility, and construct a demonstration set in training data for reference to guide knowledge picking in inference. We conduct extensive experiments on four widely-used QA datasets, and the results demonstrate the effectiveness of the proposed method.

Explore What LLM Does Not Know in Complex Question Answering

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information