Corona Virus (COVID-19) Turkish Tweets Dataset

Corona Virus (COVID-19) Turkish Tweets Dataset

Citation Author(s):
IBRAHIM
SABUNCU
Yalova University
ZEYNEP
YUREK
Submitted by:
Ibrahim Sabuncu
Last updated:
Tue, 05/19/2020 - 07:48
DOI:
10.21227/0wf0-0792
Data Format:
Links:
License:
Dataset Views:
941
Rating:
0
0 ratings - Please login to submit your rating.
Share / Embed Cite

This data set includes Covid-19 related Tweet messages written in Turkish that contain at least one of four keywords (Covid, Kovid, Corona, Korona). These keywords are used to express Covid-19 virus in Turkey. Tweets collection was started from 11th March 2020, the first Covid-19 case seen in Turkey.

Currently dataset contain 4,8 million tweets with 6 different attribute of each tweets that were sent from 9 March 2020 until 6 May 2020.

The data file contains comma separated values (CSV). It contains the following information (6 Column) for each tweet in the data file:

Created-At: Exact creation time of the tweet
From-User-Id: Sender User Id
To-User-Id: if it is sent to a user, its user ID
Language: All Turkish
Retweet-Count: number of retweets
Id: ID of tweet that is unique for all tweets

Search Twitter Operator of RapidMiner Software was used to collect tweets via the Twitter API. Due to the differentiation of keywords used in the time period and because of some technical constraints, the number of tweets collected daily for some days was less than normal until 30 March. After March 30, all Turkish tweets about covid-19 were collected continuously. The details of this subject are explained at the below.

The data collection study started on March 17. In the Twitter API used, there is a 10,000-tweet upper limit and a last week time limit in each search to collect past tweets. Therefore, the oldest tweets that can be collected belong to March 11. Detecting the first cases in Turkey were also held on 11 March. So, tweets been collected since the first cases detected in Turkey.

In order to collect data, RapidMiner Data mining software was used, and a maximum of 10,000 tweets were collected for each day, from 11 March until 17 March. In this way, after the past data of the last week were collected, the last sent 10,000 tweets were taken at intervals of twenty minutes (Twitter API can be used with 15 minutes interval, added 5 minutes more for precautionary). Thus, if more than 10,000 tweets were not posted within 20 minutes, it was possible to gather all the tweets. Of course, in less than 10,000 tweets were sent within 20 minutes, the same tweets were repeatedly drawn in different iterations. For this reason, duplicate records were deleted using the Tweet ID number. RapidMiner Turbo Prep application was used for this process.

While Turkish tweets containing the word "Corona" were collected as of March 11, the ones containing the word "Covid" started to be collected after March 16. With the widespread use of the words "Kovid" and "Korona", since March 30, all Turkish Tweets containing at least one of 4 keywords were collected using the search phrase "Covid OR Kovid OR Korona OR Corona".

Instructions: 

Currently dataset contain 4,8 million tweets with 6 different attribute of each tweets that were sent from 9 March 2020 until 6 May 2020.

Original CSV data file is zipped by WinRAR to upload and download easily. The zipped file size is 76 MB.

This data can be used for text mining such as topic modelling, sentiment analysis etc.

The data file contains comma separated values (CSV). It contains the following information (6 Column) for each tweet in the data file:

Created-At: Exact creation time of the tweet
From-User-Id: Sender User Id
To-User-Id: if it is sent to a user, its user ID
Language: All Turkish
Retweet-Count: number of retweets
Id: ID of tweet that is unique for all tweets

Dataset Files

You must login with an IEEE Account to access these files. IEEE Accounts are FREE.

Sign Up now or login.

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] IBRAHIM SABUNCU, ZEYNEP YUREK, "Corona Virus (COVID-19) Turkish Tweets Dataset", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/0wf0-0792. Accessed: May. 30, 2020.
@data{0wf0-0792-20,
doi = {10.21227/0wf0-0792},
url = {http://dx.doi.org/10.21227/0wf0-0792},
author = {IBRAHIM SABUNCU; ZEYNEP YUREK },
publisher = {IEEE Dataport},
title = {Corona Virus (COVID-19) Turkish Tweets Dataset},
year = {2020} }
TY - DATA
T1 - Corona Virus (COVID-19) Turkish Tweets Dataset
AU - IBRAHIM SABUNCU; ZEYNEP YUREK
PY - 2020
PB - IEEE Dataport
UR - 10.21227/0wf0-0792
ER -
IBRAHIM SABUNCU, ZEYNEP YUREK. (2020). Corona Virus (COVID-19) Turkish Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/0wf0-0792
IBRAHIM SABUNCU, ZEYNEP YUREK, 2020. Corona Virus (COVID-19) Turkish Tweets Dataset. Available at: http://dx.doi.org/10.21227/0wf0-0792.
IBRAHIM SABUNCU, ZEYNEP YUREK. (2020). "Corona Virus (COVID-19) Turkish Tweets Dataset." Web.
1. IBRAHIM SABUNCU, ZEYNEP YUREK. Corona Virus (COVID-19) Turkish Tweets Dataset [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/0wf0-0792
IBRAHIM SABUNCU, ZEYNEP YUREK. "Corona Virus (COVID-19) Turkish Tweets Dataset." doi: 10.21227/0wf0-0792