Twitter is one of the most popular social networks for sentiment analysis. This data set of tweets are related to the stock market. We collected 943,672 tweets between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks). 1,300 out of the 943,672 tweets were manually annotated in positive, neutral, or negative classes. A second independent annotator reviewed the manually annotated tweets.

Instructions: 

Twitter RAW data was downloaded using the Twitter REST API search, namely the "Tweepy (version 3.8.0)" Python package, which was created to make the interaction between the REST API and the developers easier. The Twitter REST API only retrieves data from the past seven days and allows to filter tweets by language. The tweets retrieved were filtered out for the English (en) language. Data collection was performed from April 9 to July 16, 2020, using the following Twitter tags as search parameter: #SPX500, #SP500, SPX500, SP500, $SPX, #stocks, $MSFT, $AAPL, $AMZN, $FB, $BBRK.B, $GOOG, $JNJ, $JPM, $V, $PG, $MA, $INTC $UNH, $BAC, $T, $HD, $XOM, $DIS, $VZ, $KO, $MRK, $CMCSA, $CVX, $PEP, $PFE. Due to the large number of data retrieved in the RAW files, it was necessary to store only each tweet's content and creation date.

 

The file tweets_labelled_09042020_16072020.csv consists of 5,000 tweets selected using random sampling out of the 943,672 sampled. Out of those 5,000 tweets, 1,300 were manually annotated and reviewed by a second independent annotator. The file tweets_remaining_09042020_16072020.csv contains the remaining 938,672 tweets.

Categories:
156 Views

Related to above sarch keywords following tweets were extracted b/w 15 nov 2020 to 10 jan 2021

29499  English TWEETS extracted,

4628 Japanese tweets extracted

678 Hindi tweets extracted 

 

Categories:
94 Views

 

This dataset is a set of eighteen directed networks that represents message exchanges among Twitter accounts during eighteen crisis events. The dataset comprises 645,339 anonymized unique user IDs and 1,396,709 edges that are labeled with respect to Plutchik's basic emotions (anger, fear, sadness, disgust, joy, trust, anticipation, and surprise) or "neutral" (if a tweet conveys no emotion).

Categories:
406 Views

This dataset includes 24,201,654 tweets related to the US Presidential Election on November 3, 2020, collected between July 1, 2020, and November 11, 2020. The related party name and sentiment scores of tweets, also the words that affect the score were added to the data set.

Instructions: 

The dataset contains more than 20 million tweets with 11 different attributes of each of them. The data file is in comma-separated values (CSV) format and its size is 3,48 GB. It is zipped by WinRAR to upload and download easily. It is zipped file size is 766 MB. It contains the following information (11 Column) for each tweet in the data file:

Created-At: Exact creation time of the tweet [Jul 1, 2020 7:44:48 PM– Nov 12, 2020 5:47:59 PM]
From-User-Id: Unique ID of the user that sent the tweet
To-User-Id: Unique ID of the user that tweet sent to
Language: Language of tweets that are coded in ISO 639-1. [%90 of tweets en: English; %3,8 und: Unidentified; %2,5 es: Spanish].
Retweet-Count: number of retweets
PartyName: The Label showing which party the tweeting is about. [Democrats] or [Republicans] if the tweet contains any keyword (that are given above) related to the Democratic or Republican party. If it contains keywords about two parties then the label is [Both]. If it doesn’t contain any keyword about two major parties (Democratic or Republican) that the label is [Neither].
Id: Unique ID of the tweet
Score: The sentiment score of the tweets. A positive (negative) score means positive (negative) emotion.
Scoring String: Nominal attribute with all words taking part in the scoring
Negativity: The sum of negative components
Positivity: The sum of positive components

The VADER algorithm is used for sentiment analysis of tweets. The VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment algorithm to score a text. it is specifically attuned to sentiments expressed in social media and produces scores based on a dictionary of words. This operator calculates and then exposes the sum of all sentiment word scores in the text. For more details about this algorithm: https://github.com/cjhutto/vaderSentiment

This data can be used for developing election result prediction methods by social media. Also, It can be used in text mining studies such as understanding the change of feelings in tweets about parties; determining the topics that cause positive or negative feelings about the candidates; to understand the main issues that Twitter users concern about the USA election.

Categories:
4319 Views

This dataset is very vast and contains tweets related to COVID-19. There are 226668 unique tweet-ids in the whole dataset that ranges from December 2019 till May 2020 . The keywords that have been used to crawl the tweets are 'corona',  ,  'covid ' , 'sarscov2 ',  'covid19', 'coronavirus '.  For getting the other 33 fields of data drop a mail at "avishekgarain@gmail.com". Twitter doesn't allow public sharing of other details related to tweet data( texts,etc.) so can't upload here.

Instructions: 

Read the documentation properly and use the code snippet written in python to load data.

Categories:
2313 Views

This data set includes Covid-19 related Tweet messages written in Turkish that contain at least one of four keywords (Covid, Kovid, Corona, Korona). These keywords are used to express Covid-19 virus in Turkey. Tweets collection was started from 11th March 2020, the first Covid-19 case seen in Turkey.

Currently dataset contain 4,8 million tweets with 6 different attribute of each tweets that were sent from 9 March 2020 until 6 May 2020.

The data file contains comma separated values (CSV). It contains the following information (6 Column) for each tweet in the data file:

Instructions: 

Currently dataset contain 4,8 million tweets with 6 different attribute of each tweets that were sent from 9 March 2020 until 6 May 2020.

Original CSV data file is zipped by WinRAR to upload and download easily. The zipped file size is 76 MB.

This data can be used for text mining such as topic modelling, sentiment analysis etc.

The data file contains comma separated values (CSV). It contains the following information (6 Column) for each tweet in the data file:

Created-At: Exact creation time of the tweet
From-User-Id: Sender User Id
To-User-Id: if it is sent to a user, its user ID
Language: All Turkish
Retweet-Count: number of retweets
Id: ID of tweet that is unique for all tweets

Categories:
3894 Views