SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding

Authors

  • Tianyu Yu Tsinghua University
  • Chengyue Jiang ShanghaiTech University
  • Chao Lou ShanghaiTech University
  • Shen Huang Alibaba Group
  • Xiaobin Wang Alibaba Group
  • Wei Liu ShanghaiTech University
  • Jiong Cai ShanghaiTech University
  • Yangning Li Tsinghua University
  • Yinghui Li Tsinghua University
  • Kewei Tu ShanghaiTech University
  • Hai-Tao Zheng Tsinghua University
  • Ningyu Zhang Zhejiang University
  • Pengjun Xie Alibaba Group
  • Fei Huang Alibaba Group
  • Yong Jiang Alibaba Group

DOI:

https://doi.org/10.1609/aaai.v38i17.29917

Keywords:

NLP: (Large) Language Models, NLP: Information Extraction, NLP: Sentence-level Semantics, Textual Inference, etc., NLP: Sentiment Analysis, Stylistic Analysis, and Argument Mining, NLP: Text Classification

Abstract

Large language models (LLMs) have shown impressive abilities for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity typing. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our models are accessible at https://github.com/Alibaba-NLP/SeqGPT.

Downloads

Published

2024-03-24

How to Cite

Yu, T., Jiang, C., Lou, C., Huang, S., Wang, X., Liu, W., Cai, J., Li, Y., Li, Y., Tu, K., Zheng, H.-T., Zhang, N., Xie, P., Huang, F., & Jiang, Y. (2024). SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19458-19467. https://doi.org/10.1609/aaai.v38i17.29917

Issue

Section

AAAI Technical Track on Natural Language Processing II