To Interpret or Not to Interpret PCA? This Is Our Question

Dan Vilenchik; Barak Yichye; Maor Abutbul

doi:10.1609/icwsm.v13i01.3265

Authors

Dan Vilenchik Ben-Gurion University of the Negev
Barak Yichye Ben-Gurion University of the Negev
Maor Abutbul Ben-Gurion University of the Negev

DOI:

https://doi.org/10.1609/icwsm.v13i01.3265

Abstract

Principal Component Analysis (PCA) is a central tool for analyzing data and social media data in particular. Typically, the data is projected on the first two PCs to obtain a twodimensional view, and trends and patterns are being examined. A key to making sense of the projected data is the semantic interpretation of the new axes (the PCs). To label the PCs, one usually looks at the top k vector entries in absolute value and assigns meaning according to them. The choice of k is done by “eyeballing” the vector. In this work we provide a computational framework to support this process and suggest an interpretability score, which measures how sensitive the interpretation step could be to the choice of k. Furthermore we give a visual method to choose the optimal k. We study our methodology in four social media platforms and discover that in two of them, Twitter and Instagram, interpretation can be done in a carefree manner, but in Steam and LinkedIn there is no natural labeling of the axes. This separation is clearly reflected in the interpretability score that each dataset received.

To Interpret or Not to Interpret PCA? This Is Our Question

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information