Sentiment Analysis

Air travel is one of the most used ways of transit in our daily lives. So it's no wonder that more and more people are sharing their experiences with airlines and airports using web-based online surveys. This dataset aims to do topic modeling and sentiment analysis on Skytrax ( and Tripadvisor ( postings where there is a lot of interest and engagement from people who have used it or want to use it for airlines.


Companion data of the paper "Using social media and personality traits to assess software developers’ emotions" submitted to the IEEE Access journal, 2022. This dataset contains the anonymized dataset used in the study, including the answers of demographic survey, the answers to the Big Five Inventory, the experiment protocol, the manual analysis from psychologists and participants, all generated charts and data analysis.


Twitter is one of the most popular social networks for sentiment analysis. This data set of tweets are related to the stock market. We collected 943,672 tweets between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks). 1,300 out of the 943,672 tweets were manually annotated in positive, neutral, or negative classes. A second independent annotator reviewed the manually annotated tweets.


India is known for its highly disciplined foreign policies, strategic location, vibrant and massive Diaspora. India envisages enhancing its scope of cooperation, trade and widens its sphere of relations with the Pacific. As a result, the world is witnessing the rise of Indo-Pacific ties. Before the 1980’s the keystone of the universe was called the Atlantic, but now a radical shift to the east is noticed by the term “Indo-Pacific‟.


This dataset was extracted from Twitter using keywords related to Dilma Roussef and Aécio Neves, that were the candidates of the second round of the 2014 presidential election in Brazil. This dataset contains texts in Portuguese and the respective classification of sentiments resulting from the techniques described in the article published in the 2018 IEEE International Conference on Data Mining Workshops - ICDMW ( 



This dataset includes 24,201,654 tweets related to the US Presidential Election on November 3, 2020, collected between July 1, 2020, and November 11, 2020. The related party name and sentiment scores of tweets, also the words that affect the score were added to the data set.


This dataset contains nearly 1 Million unique movie reviews from 1150 different IMDb movies spread across 17 IMDb genres - Action, Adventure, Animation, Biography, Comedy, Crime, Drama, Fantasy, History, Horror, Music, Mystery, Romance, Sci-Fi, Sport, Thriller and War. The dataset also contains movie metadata such as date of release of the movie, run length, IMDb rating, movie rating (PG-13, R, etc), number of IMDb raters, and number of reviews per movie.


This dataset page is currently being updated. The tweets collected by the model deployed at are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:


This dataset contains user online reviews of two tourist places namely London' Parks and Art Museums.

London Parks Reviews

In this dataset, the top five most visited parks are selected, such as St. James' Park, Hyde Park, Regent's Park, Kensington Park and Greenwich Park. For each park, 600 reviews from Jan to Sep 2017 are present in CSV format.

London Art Museums