This dataset contains IDs and sentiment scores of geo-tagged tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. For detailed instructions on the hydration of tweet IDs, please read this article.
Each CSV file contains a list of tweet IDs. You can use these tweet IDs to download fresh data from Twitter (read this article: hydrating tweet IDs). To make it easy for the NLP researchers to get access to the sentiment analysis of each collected tweet, the sentiment score computed by TextBlob has been appended as the second column. To hydrate the tweet IDs, you can use applications such as Hydrator (available for OS X, Windows and Linux) or twarc (python library).
Getting the CSV files of this dataset ready for hydrating the tweet IDs:
import pandas as pd
dataframe.to_csv("ready_april28_april29.csv", index=False, header=None)
The above example code takes in the original CSV file (i.e., april28_april29.csv) from this dataset and exports just the tweet ID column to a new CSV file (i.e., ready_april28_april29.csv). The newly created CSV file can now be consumed by the Hydrator application for hydrating the tweet IDs. To export the tweet ID column into a TXT file, just replace ".csv" with ".txt" in the to_csv function (last line) of the above example code.
If you are not comfortable with Python and pandas, you can upload these CSV files to your Google Drive and use Google Sheets to delete the second column. Once finished with the deletion, download the edited CSV files: File > Download > Comma-separated values (.csv, current sheet). These downloaded CSV files are now ready to be used with the Hydrator app for hydrating the tweet IDs.