*.JSON (ZIP)
NCBI: The NCBI dataset is a biomedical corpus containing 793 PubMed abstracts, each manually annotated to include disease mentions and their corresponding concepts, providing a high-quality gold standard for disease name recognition and normalization research.
- Categories:
This work presents a dataset based on multiple metrics namely KQIs, which provide the E2E conditions of different services. Particularly, the dataset considers video streaming and cloud gaming (CG) services.
- Categories:
This benchmark dataset accompanies an article paper titled ``Learning to Reuse Distractors to support Multiple Choice Question Generation in Education''. It contains a test of 298 educational questions covering multiple subjects & languages and a 77K multilingual pool of distractor vocabulary. The goal is for a given question to propose a list of relevant candidate distractors from the pool of distractors.
- Categories:
This paper investigates the issue of generating multiple questions with respect to a given context paragraph. Existing designs of question generation (QG) model take no notice of intra-group similarity and type diversity for forming a question group. These attributes are critical for employing QG techniques in educational applications. This paper proposes a two-stage framework by combining neural language models and genetic algorithm for the question group generation task.
- Categories:
We constructed a rich AttackDB that consists of CTI from the MITRE ATT\&CK Enterprise knowledge base, the AlienVault Open Threat Exchange, the IBM X-Force Exchange and VirusTotal.
- Categories:
Bitcoin block format file obtained by Bitcoin-ETL (blk00000000-blk00159999)
- Categories:
We study the ability of neural networks to steer or control trajectories of dynamical systems on graphs, which we represent with neural ordinary differential equations (neural ODEs). To do so, we introduce a neural-ODE control (NODEC) framework and find that it can learn control signals that drive graph dynamical systems into desired target states. While we use loss functions that do not constrain the control energy, our results show that NODEC produces control signals that are highly correlated with optimal (or minimum energy) control signals.
- Categories:
We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.
- Categories: