Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study

Haotian Chen; Lingwei Zhang; Yiran Liu; Yang Yu

doi:10.1609/aaai.v38i19.30086

Authors

Haotian Chen Fudan University
Lingwei Zhang Johns Hopkins University
Yiran Liu Tsinghua University
Yang Yu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i19.30086

Keywords:

General

Abstract

While large language models (LLMs) exhibit impressive performance on a wide range of NLP tasks, most of them fail to learn the causality from correlation, which disables them from learning rationales for predicting. Rethinking the whole developing process of LLMs is of great urgency as they are adopted in various critical tasks that need rationales, including legal text prediction (e.g., legal judgment prediction). In this paper, we first explain the underlying theoretical mechanism of their failure and argue that both the data imbalance and the omission of causality in model design and selection render the current training-testing paradigm failed to select the unique causality-based model from correlation-based models. Second, we take the legal text prediction task as the testbed and reconstruct the developing process of LLMs by simultaneously infusing causality into model architectures and organizing causality-based adversarial attacks for evaluation. Specifically, we base our reconstruction on our theoretical analysis and propose a causality-aware self-attention mechanism (CASAM), which prevents LLMs from entangling causal and non-causal information by restricting the interaction between causal and non-causal words. Meanwhile, we propose eight kinds of legal-specific attacks to form causality-based model selection. Our extensive experimental results demonstrate that our proposed CASAM achieves state-of-the-art (SOTA) performances and the strongest robustness on three commonly used legal text prediction benchmarks. We make our code publicly available at https://github.com/Carrot-Red/Rethink-LLM-development.

Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information