What Yelp Fake Review Filter Might Be Doing?

Arjun Mukherjee; Vivek Venkataraman; Bing Liu; Natalie Glance

doi:10.1609/icwsm.v7i1.14389

Authors

Arjun Mukherjee University of Illinois at Chicago
Vivek Venkataraman University of Illinois at Chicago
Bing Liu University of Illinois at Chicago
Natalie Glance Google Inc.

DOI:

https://doi.org/10.1609/icwsm.v7i1.14389

Keywords:

Fake Review Detection, Deceptive Opinion Spam

Abstract

Online reviews have become a valuable resource for decision making. However, its usefulness brings forth a curse ‒ deceptive opinion spam. In recent years, fake review detection has attracted significant attention. However, most review sites still do not publicly filter fake reviews. Yelp is an exception which has been filtering reviews over the past few years. However, Yelp’s algorithm is trade secret. In this work, we attempt to find out what Yelp might be doing by analyzing its filtered reviews. The results will be useful to other review hosting sites in their filtering effort. There are two main approaches to filtering: supervised and unsupervised learning. In terms of features used, there are also roughly two types: linguistic features and behavioral features. In this work, we will take a supervised approach as we can make use of Yelp’s filtered reviews for training. Existing approaches based on supervised learning are all based on pseudo fake reviews rather than fake reviews filtered by a commercial Web site. Recently, supervised learning using linguistic n-gram features has been shown to perform extremely well (attaining around 90% accuracy) in detecting crowdsourced fake reviews generated using Amazon Mechanical Turk (AMT). We put these existing research methods to the test and evaluate performance on the real-life Yelp data. To our surprise, the behavioral features perform very well, but the linguistic features are not as effective. To investigate, a novel information theoretic analysis is proposed to uncover the precise psycholinguistic difference between AMT reviews and Yelp reviews (crowdsourced vs. commercial fake reviews). We find something quite interesting. This analysis and experimental results allow us to postulate that Yelp’s filtering is reasonable and its filtering algorithm seems to be correlated with abnormal spamming behaviors.

What Yelp Fake Review Filter Might Be Doing?

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information