Stock Market Tweets Data

Citation Author(s):: Bruno Taborda (Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal & Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR, Lisbon, Portugal & CISUC - Center for Informatics and Systems of the University of Coimbra, Coimbra, Portugal)

Ana de Almeida (Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal & Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR, Lisbon, Portugal & CISUC - Center for Informatics and Systems of the University of Coimbra, Coimbra, Portugal)

José Carlos Dias (Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal & Business Research Unit (BRU-IUL), Lisbon, Portugal)

Fernando Batista (Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal & INESC-ID, Lisbon, Portugal)

Ricardo Ribeiro (Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal & INESC-ID, Lisbon, Portugal)
Submitted by:: Bruno Taborda
Last updated:: Thu, 05/13/2021 - 14:27
DOI:: 10.21227/g8vy-5w61
Data Format:: CSV
Links:: Bruno Taborda LinkedIn

11558 views

Categories:

Keywords:

CITE

Abstract

Twitter is one of the most popular social networks for sentiment analysis. This data set of tweets are related to the stock market. We collected 943,672 tweets between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks). 1,300 out of the 943,672 tweets were manually annotated in positive, neutral, or negative classes. A second independent annotator reviewed the manually annotated tweets. This annotated data set can contribute to create new domain-specific lexicons or enrich some of the actual dictionaries. Researchers can train their supervised models using the annotated data set. Additionally, the full data set can be used for text mining and sentiment analysis related to the stock market.

Instructions:

Twitter RAW data was downloaded using the Twitter REST API search, namely the "Tweepy (version 3.8.0)" Python package, which was created to make the interaction between the REST API and the developers easier. The Twitter REST API only retrieves data from the past seven days and allows to filter tweets by language. The tweets retrieved were filtered out for the English (en) language. Data collection was performed from April 9 to July 16, 2020, using the following Twitter tags as search parameter: #SPX500, #SP500, SPX500, SP500, $SPX, #stocks, $MSFT, $AAPL, $AMZN, $FB, $BBRK.B, $GOOG, $JNJ, $JPM, $V, $PG, $MA, $INTC $UNH, $BAC, $T, $HD, $XOM, $DIS, $VZ, $KO, $MRK, $CMCSA, $CVX, $PEP, $PFE. Due to the large number of data retrieved in the RAW files, it was necessary to store only each tweet's content and creation date.

The file tweets_labelled_09042020_16072020.csv consists of 5,000 tweets selected using random sampling out of the 943,672 sampled. Out of those 5,000 tweets, 1,300 were manually annotated and reviewed by a second independent annotator. The file tweets_remaining_09042020_16072020.csv contains the remaining 938,672 tweets.

Need for research

Debjyoti Paul Sun, 07/04/2021 - 05:02 Permalink

Need for research !!

Sean Lim Wed, 12/08/2021 - 09:02 Permalink

Needed for research

Mahesh Bhat Thu, 12/30/2021 - 00:25 Permalink

Need for research

CHUKWUJEKWU EZEMA Tue, 03/15/2022 - 13:22 Permalink

Need For Research

Ahmad Sudrajad Thu, 06/16/2022 - 00:55 Permalink

Need your data set for research

Sania Umer Thu, 10/27/2022 - 04:55 Permalink

Need For Research

YI LI Tue, 07/04/2023 - 22:41 Permalink

Need for research

Esra Melike Cakir Sat, 10/28/2023 - 15:18 Permalink

Needed for research

Ruben Marzetti Tue, 02/20/2024 - 06:45 Permalink

Needed for research

Shokoufeh Naseri Tue, 12/17/2024 - 11:12 Permalink

Student assignment

Ben Yip Wed, 01/08/2025 - 15:14 Permalink

Dataset Files

tweets.zip (Size: 55.65 MB)

Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.

Datasets

Open Access

Stock Market Tweets Data

Abstract

Instructions:

Dataset Files

QUESTIONS?

More like this Dataset

Weather Monitoring Station For Farms And Agriculture

Trilateration based on RSSI values in transmitters and receivers

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications: Centralized and Federated Learning