Twitter Sentiment Analysis Data

Each database (*.db) contain three columns. First column: date and time of the tweet, second column: tweet, third column: sentiment score for the particular tweet within the range [-1,1] with -1 being the most negative, 0 being the neutral and +1 being the most positive sentiment. The tweets have been collected by the model deployed here at [1]. The last column, viz. sentiment score, is not the score estimated by the model. The model is still in the pre-alpha phase. Therefore, to make it easy for the NLP researchers to get access to the sentiment analysis of each collected tweet, the sentiment score out of TextBlob [2] has been appended as the last column. Note: This project was designed as a part of the major project (disaster response system) to understand the optimization of VMs and databases for efficient classification of social media messages.

If you're looking for geolocation-based COVID-19 sentiment data:


[1]   [2]


The .db files are SQLite files. The procedure of working with them is just as handling normal SQLite files.

Below is an illustration of how a connection can be made to the SQLite database, fetch the whole database as a pandas data frame and work on the data frame. 

conn = sqlite3.connect('/path/to/file.db')

c = conn.cursor()

df_pie = pd.read_sql("SELECT * FROM sentiment", conn)

total_tweets = df_pie.shape[0]

p_tweets = df_pie.apply(lambda x: True if x['sentiment'] > 0 else False , axis=1)

positive_tweets = len(p_tweets[p_tweets == True].index)

n_tweets = df_pie.apply(lambda x: True if x['sentiment'] < 0 else False , axis=1)

negative_tweets = len(n_tweets[n_tweets == True].index)

neutral_tweets = total_tweets - positive_tweets - negative_tweets 


