Faithful in Steps: Improving Generalization and Citation in RAG via Query Decomposition

Yue Liu; Zhongying Ru; Shimin Di; Jipeng Zhang; Ruiyuan Zhang; Xiaofang Zhou

doi:10.1609/aaai.v40i42.40879

Authors

Yue Liu The Hong Kong University of Science and Technology (HKUST)
Zhongying Ru Independent Researcher
Shimin Di School of Computer Science and Engineering, Southeast University Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications
Jipeng Zhang The Hong Kong University of Science and Technology (HKUST)
Ruiyuan Zhang Hong Kong Generative AI Research and Development Center (HKGAI)
Xiaofang Zhou The Hong Kong University of Science and Technology (HKUST)

DOI:

https://doi.org/10.1609/aaai.v40i42.40879

Abstract

Retrieval-augment generation is a prevalent strategy to mitigate hallucinations of LLMs. The attributable RAG (RAGQ) generates quotes for its answers. The quotes indicate which input contexts support the RAG to derive the answers, enhancing the answer's verifiability and trustworthiness. However, existing RAGQs exhibit significant degradation when dealing with questions that require multi-hop reasoning and multi-modal understanding, suffering from over-citation, implicit entity identification failure, and poor generalization. In this paper, we propose a novel RAGQ framework, namely QDRAG. QDRAG breaks down the input question into atomic subquestions to identify the implicit entities. Then, the reranker prunes context distractors to eliminate the downstream over-citation. To facilitate query decomposition, we propose two zero-shot approaches: QD-C and QD-R, which guide the QD MLLM to decompose the question based on context knowledge and retrieval rewards, respectively. One interesting finding is that finetuning on the QD task shows better generalizability compared to directly finetuning on the downstream RAGQ task. Experiments on four multi-modal QA benchmarks demonstrate QDRAG's efficacy in grounding answers and generating faithful citations. The framework significantly outperforms all the baselines on both in-domain and out-of-domain tests, even surpassing Gemini-Pro.

Faithful in Steps: Improving Generalization and Citation in RAG via Query Decomposition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information