Multiset Feature Learning for Highly Imbalanced Data Classification

Authors

  • Fei Wu Wuhan University and Nanjing University of Posts and Telecommunications
  • Xiao-Yuan Jing Wuhan University and Nanjing University of Posts and Telecommunications
  • Shiguang Shan Chinese Academy of Sciences (CAS)
  • Wangmeng Zuo Harbin Institute of Technology
  • Jing-Yu Yang Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v31i1.10739

Abstract

With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio of data is high, most existing imbalanced learning methods decline in classification performance. To address this problem, a few highly imbalanced learning methods have been presented. However, most of them are still sensitive to the high imbalance ratio. This work aims to provide an effective solution for the highly imbalanced data classification problem. We conduct highly imbalanced learning from the perspective of feature learning. We partition the majority class into multiple blocks with each being balanced to the minority class and combine each block with the minority class to construct a balanced sample set. Multiset feature learning (MFL) is performed on these sets to learn discriminant features. We thus propose an uncorrelated cost-sensitive multiset learning (UCML) approach. UCML provides a multiple sets construction strategy, incorporates the cost-sensitive factor into MFL, and designs a weighted uncorrelated constraint to remove the correlation among multiset features. Experiments on five highly imbalanced datasets indicate that: UCML outperforms state-of-the-art imbalanced learning methods.

Downloads

Published

2017-02-12

How to Cite

Wu, F., Jing, X.-Y., Shan, S., Zuo, W., & Yang, J.-Y. (2017). Multiset Feature Learning for Highly Imbalanced Data Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10739

Issue

Section

Main Track: Machine Learning Applications