Chain-of-Thought Improves Text Generation with Citations in Large Language Models


  • Bin Ji National University of Singapore
  • Huijun Liu National University of Singapore
  • Mingzhe Du National University of Singapore
  • See-Kiong Ng National University of Singapore



NLP: (Large) Language Models, NLP: Generation


Previous studies disclose that Large Language Models (LLMs) suffer from hallucinations when generating texts, bringing a novel and challenging research topic to the public, which centers on enabling LLMs to generate texts with citations. Existing work exposes two limitations when using LLMs to generate answers to questions with provided documents: unsatisfactory answer correctness and poor citation quality. To tackle the above issues, we investigate using Chain-of-Thought (CoT) to elicit LLMs’ ability to synthesize correct answers from multiple documents, as well as properly cite these documents. Moreover, we propose a Citation Insurance Mechanism, which enables LLMs to detect and cite those missing citations. We conduct experiments on the ALCE benchmark with six open-source LLMs. Experimental results demonstrate that: (1) the CoT prompting strategy significantly improves the quality of text generation with citations; (2) the Citation Insurance Mechanism delivers impressive gains in citation quality at a low cost; (3) our best approach performs comparably as previous best ChatGPT-based baselines. Extensive analyses further validate the effectiveness of the proposed approach.




How to Cite

Ji, B., Liu, H., Du, M., & Ng, S.-K. (2024). Chain-of-Thought Improves Text Generation with Citations in Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18345-18353.



AAAI Technical Track on Natural Language Processing I