The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
DOI:
https://doi.org/10.1609/aaai.v38i21.30410Keywords:
Commonsense Reasoning, Large Language Models, Robustness, Generalizability, Natural Language ProcessingAbstract
The advent of powerful transformer-based discriminative language models and, more recently, generative GPT-family models, has led to notable advancements in natural language processing (NLP), particularly in commonsense reasoning tasks. One such task is commonsense reasoning, where performance is usually evaluated through multiple-choice question-answering benchmarks. Till date, many such benchmarks have been proposed and `leaderboards' tracking state-of-the-art performance on those benchmarks suggest that transformer-based models are approaching human-like performance. However, due to documented problems such as hallucination and bias, the research focus is shifting from merely quantifying accuracy on the task to an in-depth, context-sensitive probing of LLMs' generalization and robustness. To gain deeper insight into diagnosing these models' performance in commonsense reasoning scenarios, this thesis addresses three main studies: the generalization ability of transformer-based language models on commonsense reasoning, the trend in confidence distribution of these language models confronted with ambiguous inference tasks, and a proposed risk-centric evaluation framework for both discriminative and generative language models.Downloads
Published
2024-03-24
How to Cite
Shen, K. (2024). The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23419-23420. https://doi.org/10.1609/aaai.v38i21.30410
Issue
Section
AAAI Doctoral Consortium Track