*.JSON (ZIP)

NCBI; BC5CDR; i2b2 2010; HPRD50; AIMed; MedNLI

NCBI: The NCBI dataset is a biomedical corpus containing 793 PubMed abstracts, each manually annotated to include disease mentions and their corresponding concepts, providing a high-quality gold standard for disease name recognition and normalization research.

Categories:

Artificial Intelligence

E2E dataset of Video Streaming and Cloud Gaming services over 4G and 5G

This work presents a dataset based on multiple metrics namely KQIs, which provide the E2E conditions of different services. Particularly, the dataset considers video streaming and cloud gaming (CG) services.

Categories:

Distractor Retrieval Dataset

This benchmark dataset accompanies an article paper titled ``Learning to Reuse Distractors to support Multiple Choice Question Generation in Education''. It contains a test of 298 educational questions covering multiple subjects & languages and a 77K multilingual pool of distractor vocabulary. The goal is for a given question to propose a list of relevant candidate distractors from the pool of distractors.

Categories:

EQGG-RACE

This paper investigates the issue of generating multiple questions with respect to a given context paragraph. Existing designs of question generation (QG) model take no notice of intra-group similarity and type diversity for forming a question group. These attributes are critical for employing QG techniques in educational applications. This paper proposes a two-stage framework by combining neural language models and genetic algorithm for the question group generation task.

Categories:

Artificial Intelligence

Attack DB OTX-XFORCE-VT

We constructed a rich AttackDB that consists of CTI from the MITRE ATT\&CK Enterprise knowledge base, the AlienVault Open Threat Exchange, the IBM X-Force Exchange and VirusTotal.

Categories:

Security

Bitcoin Block Data

Bitcoin block format file obtained by Bitcoin-ETL （blk00000000-blk00159999）

Categories:

Standards Research Data

Measurements of Cryptographic Primitives Execution on Android Devices

This dataset is a supplementary material for paper "A Comprehensive and Reproducible Comparison of Cryptographic Primitives Execution on Android Devices" with the measurements collected from 17 mobile devices and the code for reproducibility.

Categories:

Neural Ordinary Differential Equation Control of Dynamics on Graphs

We study the ability of neural networks to steer or control trajectories of dynamical systems on graphs, which we represent with neural ordinary differential equations (neural ODEs). To do so, we introduce a neural-ODE control (NODEC) framework and find that it can learn control signals that drive graph dynamical systems into desired target states. While we use loss functions that do not constrain the control energy, our results show that NODEC produces control signals that are highly correlated with optimal (or minimum energy) control signals.

Categories:

real world Chinese mathematics dataset

An overview of a real-world Chinese mathematics dataset removed duplicated questions and simple questions.

Categories:

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.

Categories:

NCBI; BC5CDR; i2b2 2010; HPRD50; AIMed; MedNLI

E2E dataset of Video Streaming and Cloud Gaming services over 4G and 5G

Distractor Retrieval Dataset

EQGG-RACE

Attack DB OTX-XFORCE-VT

Bitcoin Block Data

Measurements of Cryptographic Primitives Execution on Android Devices

Category

Neural Ordinary Differential Equation Control of Dynamics on Graphs

real world Chinese mathematics dataset

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

Category