This dataset is a set of eighteen directed networks that represents message exchanges among Twitter accounts during eighteen crisis events. The dataset comprises 645,339 anonymized unique user IDs and 1,396,709 edges that are labeled with respect to Plutchik's basic emotions (anger, fear, sadness, disgust, joy, trust, anticipation, and surprise) or "neutral" (if a tweet conveys no emotion).
The dataset is composed of 595,460 users, 14,273,311 links, 1,345,913 diffusion cascades, and 1,311,498 tags from Mar 24 to Apr 25, 2012. In order to capture more information cascades, Weng et al. set the tracking objects as a group of users who are connected with mutual following. Thus, the follower network is an undirected network made up of a number of disconnected components.
This database is provided for the Fake News Detection task. In addition to being used in other tasks of detecting fake news, it can be specifically used to detect fake news using the Natural Language Inference (NLI).
This dataset is designed and stored to be compatible for use with both the LIAR test dataset and FakeNewsNet (PolitiFact) datasets as evaluation data. There are two folders, each containing three CSV files.
1- 15212 training samples, 1058 validation samples, and 1054 test samples are the same as (FakeNewsNet PolitiFact) data. The classes of this data are ”real” and ”fake”.
2. 15052 training samples, 1265 validation samples, and 1266 test samples, which is the same as the LIAR test data. The classes in this data are ”pants-fire”, ”false”, and ”barely true”, ”half-true”, ”mostly-true” and ”true”.
The DataSet columns:
id: matches the id in the PolitiFact website API (unique for each sample)
date: The time each article was published in the PolitiFact website
speaker: The person or organization to whom the Statement relates
statement: A claim published in the media by a person or an organization and has been investigated in the PolitiFact article.
sources: The sources used to analyze each Statement
paragraph_based_content: content stored as paragraphed in a list
fullText_based_content: Full text using pasted paragraphs
label: The class for each sample
This data set includes US November 2020 Election related Tweet messages that contain #USAelection or at least one of the following keywords about four party:
Keywords about Democratic Party:
@DNC OR @TheDemocrats OR Biden OR @JoeBiden OR "Our best days still lie ahead" OR "No Malarkey!"
Keywords about Green Party:
@GreenPartyUS OR @TheGreenParty OR “Howie Hawkins” OR @HowieHawkins OR “Angela Walker” OR @AngelaNWalker
Keywords about Libertarian Party:
@LPNational OR “Jo Jorgersen” OR @Jorgensen4POTUS OR “Spike Cohen” OR @RealSpikeCohen
Currently dataset contain 3,5 million tweets with 6 different attribute of each tweets that were sent from 1 July 2020 until 12 August 2020.
The data file contains comma separated values (CSV) which is zipped by WinRAR to upload and download easily. It contains the following information (6 Column) for each tweet in the data file:
Created-At: Exact creation time of the tweet
From-User-Id: Sender User Id
To-User-Id: if it is sent to a user, its user ID
Language: Language of tweets that are coded in ISO 639-1. %91,7 of tweets en: English; %3,9 und: Unidentified; %2,15 es: Spanish.
Retweet-Count: number of retweets
Id: ID of tweet that is unique for all tweets
This data can be used for prediction of election result by using sentiment analysis and prediction analytics. Also, text mining such as topic modelling can be used to understand main issues that twitter users concern about us election.
Modern science is build on systematic experimentation and observation. The reproducibility and replicability of the experiments and observations are central to science. However, reproducibility and replicability are not always guaranteed, sometimes referred to as 'crisis of reproducibility'. To analyze the extent of the crisis, we conducted a survey on the state of reproducibility in remote sensing. This survey was conducted as an online survey. The answers of the respondents are saved in this dataset in full-text CSV format.
The file contains the answers to our online survey on reproducibility in remote sensing. The format is as comma-separated values (CSV) in full-text, i.e. the answers are saved in the full-text instead of numbers, allowing to easily understand and analyse.
The dataset also includes the report given from the website the survey was hosted on (kwiksurveys.com). This can be used for a quick overview of the results, but also to see the original quesetions and the possible answers.
Reddit is one of the largest social media websites in the world and it contains valuable data about its users and their perspectives organized into virtual communities or subreddits, based on common areas of interest. Substance use issues are particularly salient within this online community due to the burgeoning substance use (opioid) crisis within the United States among other countries. A particularly important location for understanding user perceptions of opioids is the Philadelphia, Pennsylvania, USA region, due to the prevalence associated with overdose deaths. To collect user sen
Included is the dataset in a CSV file, data dictionary for all variables (column key) in a text file, keyword list used to query the Reddit API in a text file, and the targeted subreddit list in a text file. The dataset comprises entries (submissions, comments) that had keyword query results within targeted subreddits. The dataset includes designations for submissions and comments within the data dictionary; submission denotes the first order entry within a subreddit, comment denotes entries that are posted in response to submissions or other comments. Rows include all potential entries within the targeted subreddits from January 1, 2005 – May 14, 2020.
There are 56,979 rows of data in the CSV file.