AI Ethics Global Document Collection

Daniel Schiff, Jason Borenstein, Justin Biddle, & Kelly Laas

Documents in the dataset were published between January 2016 through July 2019

This dataset is associated with a (forthcoming) paper in IEEE Transactions on Technology and Society, entitled "AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection.

Instructions: 

Instructions and Codebook are included in the first sheets of the dataset.

Categories:
130 Views

 

This dataset is a set of eighteen directed networks that represents message exchanges among Twitter accounts during eighteen crisis events. The dataset comprises 645,339 anonymized unique user IDs and 1,396,709 edges that are labeled with respect to Plutchik's basic emotions (anger, fear, sadness, disgust, joy, trust, anticipation, and surprise) or "neutral" (if a tweet conveys no emotion).

Categories:
626 Views

The ASU/UNSW-CICMOD01 was developed to support the novel Cyber Influence Campaign (CIC) model and ontology. It contains full captures of specific tags (hashtags) regardling individual cyber influence campaings scrapped from Twitter and Instagram.  

Categories:
437 Views

The dataset is composed of 595,460 users, 14,273,311 links, 1,345,913 diffusion cascades, and 1,311,498 tags from Mar 24 to Apr 25, 2012. In order to capture more information cascades, Weng et al. set the tracking objects as a group of users who are connected with mutual following. Thus, the follower network is an undirected network made up of a number of disconnected components.

Categories:
348 Views

One best circuit for the electronics choke for tube light

Categories:
78 Views

One new circuit design is invented by me for electronic fan regulator which is cheap and best

Categories:
75 Views

This database is provided for the Fake News Detection task. In addition to being used in other tasks of detecting fake news, it can be specifically used to detect fake news using the Natural Language Inference (NLI).

Instructions: 

This dataset is designed and stored to be compatible for use with both the LIAR test dataset and FakeNewsNet (PolitiFact) datasets as evaluation data. There are two folders, each containing three CSV files.

1- 15212 training samples, 1058 validation samples, and 1054 test samples are the same as (FakeNewsNet PolitiFact) data. The classes of this data are ”real” and ”fake”.

2. 15052 training samples, 1265 validation samples, and 1266 test samples, which is the same as the LIAR test data. The classes in this data are ”pants-fire”, ”false”, and ”barely true”, ”half-true”, ”mostly-true” and ”true”.

The DataSet columns:

id: matches the id in the PolitiFact website API (unique for each sample)

date: The time each article was published in the PolitiFact website

speaker: The person or organization to whom the Statement relates

statement: A claim published in the media by a person or an organization and has been investigated in the PolitiFact article.

sources: The sources used to analyze each Statement

paragraph_based_content: content stored as paragraphed in a list

fullText_based_content: Full text using pasted paragraphs

 

label: The class for each sample

Categories:
6913 Views

This dataset includes 24,201,654 tweets related to the US Presidential Election on November 3, 2020, collected between July 1, 2020, and November 11, 2020. The related party name and sentiment scores of tweets, also the words that affect the score were added to the data set.

Instructions: 

The dataset contains more than 20 million tweets with 11 different attributes of each of them. The data file is in comma-separated values (CSV) format and its size is 3,48 GB. It is zipped by WinRAR to upload and download easily. It is zipped file size is 766 MB. It contains the following information (11 Column) for each tweet in the data file:

Created-At: Exact creation time of the tweet [Jul 1, 2020 7:44:48 PM– Nov 12, 2020 5:47:59 PM]
From-User-Id: Unique ID of the user that sent the tweet
To-User-Id: Unique ID of the user that tweet sent to
Language: Language of tweets that are coded in ISO 639-1. [%90 of tweets en: English; %3,8 und: Unidentified; %2,5 es: Spanish].
Retweet-Count: number of retweets
PartyName: The Label showing which party the tweeting is about. [Democrats] or [Republicans] if the tweet contains any keyword (that are given above) related to the Democratic or Republican party. If it contains keywords about two parties then the label is [Both]. If it doesn’t contain any keyword about two major parties (Democratic or Republican) that the label is [Neither].
Id: Unique ID of the tweet
Score: The sentiment score of the tweets. A positive (negative) score means positive (negative) emotion.
Scoring String: Nominal attribute with all words taking part in the scoring
Negativity: The sum of negative components
Positivity: The sum of positive components

The VADER algorithm is used for sentiment analysis of tweets. The VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment algorithm to score a text. it is specifically attuned to sentiments expressed in social media and produces scores based on a dictionary of words. This operator calculates and then exposes the sum of all sentiment word scores in the text. For more details about this algorithm: https://github.com/cjhutto/vaderSentiment

This data can be used for developing election result prediction methods by social media. Also, It can be used in text mining studies such as understanding the change of feelings in tweets about parties; determining the topics that cause positive or negative feelings about the candidates; to understand the main issues that Twitter users concern about the USA election.

Categories:
5397 Views

Modern science is build on systematic experimentation and observation.  The reproducibility and replicability of  the experiments and observations are central to science. However, reproducibility and replicability are not always guaranteed, sometimes referred to as 'crisis of reproducibility'. To analyze the extent of the crisis, we conducted a survey on the state of reproducibility in remote sensing. This survey was conducted as an online survey. The answers of the respondents are saved in this dataset in full-text CSV format.

Instructions: 

The file contains the answers to our online survey on reproducibility in remote sensing. The format is as comma-separated values (CSV) in full-text, i.e. the answers are saved in the full-text instead of numbers, allowing to easily understand and analyse.

 

The dataset also includes the report given from the website the survey was hosted on (kwiksurveys.com). This can be used for a quick overview of the results, but also to see the original quesetions and the possible answers. 

Categories:
170 Views

Pages