Enhancing Elusive Clues in Knowledge Learning by Contrasting Attention of Language Models

Authors

  • Jian Gao Department of Energy and Power Engineering, Tsinghua University
  • Xiao Zhang Department of Electronic Engineering, Tsinghua University
  • Miao Li Department of Electronic Engineering, Tsinghua University
  • Ji Wu Department of Electronic Engineering, Tsinghua University College of AI, Tsinghua University Beijing National Research Center for Information Science and Technology

DOI:

https://doi.org/10.1609/aaai.v39i22.34563

Abstract

Causal language models acquire vast amount of knowledge from general text corpus during pretraining, but the efficiency of knowledge learning is known to be unsatisfactory, especially when learning from knowledge-dense and small-sized corpora. The deficiency can come from long-distance dependencies which are hard to capture by language models, and overfitting to co-occurrence patterns and distracting clues in the training text. To address these issues, the paper proposes a method to enhance knowledge learning during language model pretraining, by enhancing elusive but important clues in text discovered by the language model themselves. We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models. Therefore, we can identify these clues by contrasting the attention weights of large and small language models. We use the identified clues as a guide to perform token-dropout data augmentation on the training text, and observed a significant boost in both small and large models' performance in fact memorization. This shows that the behavior contrast between more and less-performant language models contains important clues for knowledge learning, and it can be "amplified" for a straight-forward improvement in knowledge learning efficiency.

Published

2025-04-11

How to Cite

Gao, J., Zhang, X., Li, M., & Wu, J. (2025). Enhancing Elusive Clues in Knowledge Learning by Contrasting Attention of Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23905-23913. https://doi.org/10.1609/aaai.v39i22.34563

Issue

Section

AAAI Technical Track on Natural Language Processing I