Segmentation of Tweets with URLs and its Applications to Sentiment Analysis


  • Abdullah Aljebreen Temple University
  • Weiyi Meng Binghamton University
  • Eduard Dragut Temple University


Text Classification & Sentiment Analysis, Syntax -- Tagging, Chunking & Parsing, Information Extraction


An important means for disseminating information in social media platforms is by including URLs that point to external sources in user posts. In Twitter, we estimate that about 21% of the daily stream of English-language tweets contain URLs. We notice that NLP tools make little attempt at understanding the relationship between the content of the URL and the text surrounding it in a tweet. In this work, we study the structure of tweets with URLs relative to the content of the Web documents pointed to by the URLs. We identify several segments classes that may appear in a tweet with URLs, such as the title of a Web page and the user's original content. Our goals in this paper are: introduce, define, and analyze the segmentation problem of tweets with URLs, develop an effective algorithm to solve it, and show that our solution can benefit sentiment analysis on Twitter. We also show that the problem is an instance of the block edit distance problem, and thus an NP-hard problem.




How to Cite

Aljebreen, A., Meng, W., & Dragut, E. (2021). Segmentation of Tweets with URLs and its Applications to Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 35(14), 12480-12488. Retrieved from



AAAI Technical Track on Speech and Natural Language Processing I