We have implemented a successful test case in which we created a set of reviews, half of which are for Chinese restaurants and half for Indian restaurants. We have successfully developed a proof-of-concept for a process of Yelp business sub-categorization through textual data mining. The most prevalent words from reviews in each business cluster are then calculated and displayed. These clusters are built up gradually until a stopping case is satisfied which indicates the presence of two significant clusters. Then, we created a recursive function to build up business clusters by merging businesses with their ‘nearest neighbor’ business based on Jaccard similarity. To build up clusters of similar businesses based on the text of their reviews, we used an agglomerative technique in which we initialized a clusters dictionary with each individual businesses being the sole member of its own cluster.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |