Hierarchical Estimation Framework of Multi-Label Classifying: A Case of Tweets Classifying into Real Life Aspects
Many people share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user’s real life, i.e., eating, traffic conditions, weather, and so on. Since some tweets indicate two or more aspects, multi-label classification is required. Typical methods are not performed on tweets because they consist of short and elided sentences. To conquer these problems, we are researching a hierarchical estimation framweork (HEF) to estimate several aspects of unknown tweets. HEF is composed of both unsupervised and supervised machine learnings. In the first phase, it extracts topics from a sea of tweets using latent dirichlet allocation (LDA). In the second phase, it calculates the relevance between topcis and aspects using a small set of labeled tweets to build associations among them. In this paper, we introduce the entropy feedback method in the second phase. We evaluate the Shannon entropy of each association between the aspects and topics and iteratively calculate the feedback coefficients by entropy to achieve optimal associations. Our sophisticated experimental evaluations with a large amount of actual tweets demonstrate the high efficiency of our multi-labeling method. Our entropy feedback method successfully increased higher F-measures in all aspects. Expecially in Disaster and Traffic aspects, precision greatly increased without decreasing recall.