Predicting User Replying Behavior on a Large Online Dating Site
Keywords:online dating, user behavior, reply prediction
Online dating sites have become popular platforms for people to look for potential romantic partners. Many online dating sites provide recommendations on compatible partners based on their proprietary matching algorithms. It is important that not only the recommended dates match the user's preference or criteria, but also the recommended users are interested in the user and likely to reciprocate when contacted. The goal of this paper is to predict whether an initial contact message from a user will be replied to by the receiver. The study is based on a large scale real-world dataset obtained from a major dating site in China with more than sixty million registered users. We formulate our reply prediction as a link prediction problem of social networks and approach it using a machine learning framework. The availability of a large amount of user profile information and the bipartite nature of the dating network present unique opportunities and challenges to the reply prediction problem. We extract user-based features from user profiles and graph-based features from the bipartite dating network, apply them in a variety of classification algorithms, and compare the utility of the features and performance of the classifiers. Our results show that the user-based and graph-based features result in similar performance, and can be used to effectively predict the reciprocal links. Only a small performance gain is achieved when both feature sets are used. Among the five classifiers we considered, random forests method outperforms the other four algorithms (naive Bayes, logistic regression, KNN, and SVM). Our methods and results can provide valuable guidelines to the design and performance of recommendation engine for online dating sites.