Learning Representations for Incomplete Time Series Clustering

Qianli Ma; Chuxin Chen; Sen Li; Garrison W. Cottrell

doi:10.1609/aaai.v35i10.17070

Authors

Qianli Ma South China University of Technology Key Laboratory of Big Data and Intelligent Robot (South China University of Technology), Ministry of Education
Chuxin Chen South China University of Technology
Sen Li South China University of Technology
Garrison W. Cottrell University of California, San Diego

DOI:

https://doi.org/10.1609/aaai.v35i10.17070

Keywords:

Time-Series/Data Streams

Abstract

Time-series clustering is an essential unsupervised technique for data analysis, applied to many real-world fields, such as medical analysis and DNA microarray. Existing clustering methods are usually based on the assumption that the data is complete. However, time series in real-world applications often contain missing values. Traditional strategy (imputing first and then clustering) does not optimize the imputation and clustering process as a whole, which not only makes per- formance dependent on the combination of imputation and clustering methods but also fails to achieve satisfactory re- sults. How to best improve the clustering performance on incomplete time series remains a challenge. This paper pro- poses a novel unsupervised temporal representation learning model, named Clustering Representation Learning on Incom- plete time-series data (CRLI). CRLI jointly optimizes the im- putation and clustering process to impute more discrimina- tive values for clustering and make the learned representa- tions possessed good clustering property. Also, to reduce the error propagation from imputation to clustering, we introduce a discriminator to make the distribution of imputation values close to the true one and train CRLI in an alternating train- ing manner. An experiment conducted on eight real-world in- complete time-series datasets shows that CRLI outperforms existing methods. We demonstrates the effectiveness of the learned representations and the convergence of the model through visualization analysis. Moreover, we reveal that the joint training strategy can impute values close to the true ones in those important sub-sequences, and impute more discrim- inative values in those less important sub-sequences at the same time, making the imputed sequence cluster-friendly.

Learning Representations for Incomplete Time Series Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription