Multiset Feature Learning for Highly Imbalanced Data Classification

Fei Wu; Xiao-Yuan Jing; Shiguang Shan; Wangmeng Zuo; Jing-Yu Yang

doi:10.1609/aaai.v31i1.10739

Authors

Fei Wu Wuhan University and Nanjing University of Posts and Telecommunications
Xiao-Yuan Jing Wuhan University and Nanjing University of Posts and Telecommunications
Shiguang Shan Chinese Academy of Sciences (CAS)
Wangmeng Zuo Harbin Institute of Technology
Jing-Yu Yang Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v31i1.10739

Abstract

With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio of data is high, most existing imbalanced learning methods decline in classification performance. To address this problem, a few highly imbalanced learning methods have been presented. However, most of them are still sensitive to the high imbalance ratio. This work aims to provide an effective solution for the highly imbalanced data classification problem. We conduct highly imbalanced learning from the perspective of feature learning. We partition the majority class into multiple blocks with each being balanced to the minority class and combine each block with the minority class to construct a balanced sample set. Multiset feature learning (MFL) is performed on these sets to learn discriminant features. We thus propose an uncorrelated cost-sensitive multiset learning (UCML) approach. UCML provides a multiple sets construction strategy, incorporates the cost-sensitive factor into MFL, and designs a weighted uncorrelated constraint to remove the correlation among multiset features. Experiments on five highly imbalanced datasets indicate that: UCML outperforms state-of-the-art imbalanced learning methods.

Multiset Feature Learning for Highly Imbalanced Data Classification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription