Multi-Scale Self-Attention for Text Classification

Qipeng Guo; Xipeng Qiu; Pengfei Liu; Xiangyang Xue; Zheng Zhang

doi:10.1609/aaai.v34i05.6290

Multi-Scale Self-Attention for Text Classification

Authors

Qipeng Guo Fudan University
Xipeng Qiu Fudan University
Pengfei Liu Fudan University
Xiangyang Xue Fudan University
Zheng Zhang NYU

DOI:

https://doi.org/10.1609/aaai.v34i05.6290

Abstract

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.

Downloads

Published

2020-04-03

How to Cite

Guo, Q., Qiu, X., Liu, P., Xue, X., & Zhang, Z. (2020). Multi-Scale Self-Attention for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7847-7854. https://doi.org/10.1609/aaai.v34i05.6290

Download Citation

Issue

Vol. 34 No. 05: AAAI-20 Technical Tracks 5

Section

AAAI Technical Track: Natural Language Processing

Multi-Scale Self-Attention for Text Classification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription