Embedding-Based Complex Feature Value Coupling Learning for Detecting Outliers in Non-IID Categorical Data
Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.