Neural Bag-of-Ngrams


  • Bofang Li Renmin University of China
  • Tao Liu Renmin University of China
  • Zhe Zhao Renmin University of China
  • Puwei Wang Renmin University of China
  • Xiaoyong Du Renmin University of China



Bag-of-ngrams (BoN) models are commonly used for representing text. One of the main drawbacks of traditional BoN is the ignorance of n-gram's semantics. In this paper, we introduce the concept of Neural Bag-of-ngrams (Neural-BoN), which replaces sparse one-hot n-gram representation in traditional BoN with dense and rich-semantic n-gram representations. We first propose context guided n-gram representation by adding n-grams to word embeddings model. However, the context guided learning strategy of word embeddings is likely to miss some semantics for text-level tasks. Text guided n-gram representation and label guided n-gram representation are proposed to capture more semantics like topic or sentiment tendencies. Neural-BoN with the latter two n-gram representations achieve state-of-the-art results on 4 document-level classification datasets and 6 semantic relatedness categories. They are also on par with some sophisticated DNNs on 3 sentence-level classification datasets. Similar to traditional BoN, Neural-BoN is efficient, robust and easy to implement. We expect it to be a strong baseline and be used in more real-world applications.




How to Cite

Li, B., Liu, T., Zhao, Z., Wang, P., & Du, X. (2017). Neural Bag-of-Ngrams. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).



Main Track: NLP and Knowledge Representation