Artificial Intelligence
AI Ethics Global Document Collection
Daniel Schiff, Jason Borenstein, Justin Biddle, & Kelly Laas
Documents in the dataset were published between January 2016 through July 2019
This dataset is associated with a (forthcoming) paper in IEEE Transactions on Technology and Society, entitled "AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection.
- Categories:
Wildfires are one of the deadliest and dangerous natural disasters in the world. Wildfires burn millions of forests and they put many lives of humans and animals in danger. Predicting fire behavior can help firefighters to have better fire management and scheduling for future incidents and also it reduces the life risks for the firefighters. Recent advance in aerial images shows that they can be beneficial in wildfire studies.
- Categories:
The LGP dataset (LGPSSD) consists of LGP samples collected from the industrial site through the image acquisition device of LGP defect detection system. In our dataset, NG samples are regarded as positive samples, and OK samples are regarded as negative samples.
- Categories:
This dataset has been created from a collection of 56403 multidisciplinary book titles from Springer, available through the Hellenic Academic Libraries Link (https://www.heal-link.gr/en/home-2/) subscription. To obtain this dataset, a parser was created for extracting relevant information, such as the title, subtitle and ToC, from each book. The extracted information was stored in a database for further processing. Each book title in the database includes information regarding the bookid, title, and ToC.
- Categories:
This is a large Chinese taxonomic knowledge base, which is translated from Probase by the neural network.
It has 11,292,493 IsA pairs with an accuracy of 86.6%.
- Categories:
Amidst the COVID-19 pandemic, cyberbullying has become an even more serious threat. Our work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim’s age, ethnicity, gender, religion, or other quality. Previous literature has not yet explored making fine-grained cyberbullying classifications of such magnitude, and existing cyberbullying datasets suffer from quite severe class imbalances.
- Categories:
This is a large Chinese commonsense knowledge base, which is translated from ConceptNet 5.6, with around 2 million triples and an accuracy of 89.6%.
- Categories:
This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:
-
Cardio Data (Kaggle Dataset)
- Categories:
Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora (https://www.bondora.com/en). The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take.
- Categories: