Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study

Authors

  • Haotian Chen Fudan University
  • Lingwei Zhang Johns Hopkins University
  • Yiran Liu Tsinghua University
  • Yang Yu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i19.30086

Keywords:

General

Abstract

While large language models (LLMs) exhibit impressive performance on a wide range of NLP tasks, most of them fail to learn the causality from correlation, which disables them from learning rationales for predicting. Rethinking the whole developing process of LLMs is of great urgency as they are adopted in various critical tasks that need rationales, including legal text prediction (e.g., legal judgment prediction). In this paper, we first explain the underlying theoretical mechanism of their failure and argue that both the data imbalance and the omission of causality in model design and selection render the current training-testing paradigm failed to select the unique causality-based model from correlation-based models. Second, we take the legal text prediction task as the testbed and reconstruct the developing process of LLMs by simultaneously infusing causality into model architectures and organizing causality-based adversarial attacks for evaluation. Specifically, we base our reconstruction on our theoretical analysis and propose a causality-aware self-attention mechanism (CASAM), which prevents LLMs from entangling causal and non-causal information by restricting the interaction between causal and non-causal words. Meanwhile, we propose eight kinds of legal-specific attacks to form causality-based model selection. Our extensive experimental results demonstrate that our proposed CASAM achieves state-of-the-art (SOTA) performances and the strongest robustness on three commonly used legal text prediction benchmarks. We make our code publicly available at https://github.com/Carrot-Red/Rethink-LLM-development.

Downloads

Published

2024-03-24

How to Cite

Chen, H., Zhang, L., Liu, Y., & Yu, Y. (2024). Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 20958-20966. https://doi.org/10.1609/aaai.v38i19.30086

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track