By Avishek Garain, Computer Science and Engineering at Jadavpur University, Kolkata, West Bengal, India
While doing research for a paper on sentiment analysis and review categorization, we collected 69,308 reviews of hotels around the world. Our dataset includes two files: one with organized raw data and another with extracted sentiment value. The raw data included headings such as hotel name and location, trip type (e.g., business trip or vacation), review text, and the users thoughts on various features of the hotel including value, cleanliness, service, location, sleep quality, room quality, check in service, and business service. Our sentiment file extracted from the raw data the review ID and text, then assigned a sentiment value of -1 (negative), 0 (neutral), or +1 (positive) based on the review text.
Our dataset used fuzzy techniques outlined in the “Special Issue on Soft Computing for Recommender Systems and Sentiment Analysis” in the Elsevier Journal of Applied Soft Computing. This dataset is new to its domain and the results have achieved a macro f-1 effectiveness score of 86 percent. We are already seeing people use this dataset to create knowledge bases for marketing and customer service applications such as chatbots, and business applications like named entity recognition, sentiment analysis and review categorization.
Benefits of Hosting Data on the IEEE DataPort Platform
Above all else, IEEE DataPort provides global exposure for my data and this exposure helps to create reproducible research. Other high-quality datasets on IEEE DataPort, such as a popular sentiment analysis on tweets, have inspired many projects like my own.
The dataset from the research conducted by Avishek Garain won the Spring 2020 IEEE DataPort Data Competition.