Distractor Retrieval Dataset

Citation Author(s):: Semere Kiros Bitew (Ghent University)

Amir Hadifar (Ghent University)

Lucas Sterckx

Johannes Deleu (Ghent University)

Chris Develder (Ghent University)

Thomas Demeester (Ghent University)
Submitted by:: Semere Kiros Bitew
Last updated:: Mon, 10/24/2022 - 11:54
DOI:: 10.21227/gnpy-d910
Data Format:: *.JSON (ZIP)
Links:: Learning to Reuse Distractors to support Multiple Choice Question Generation in…

377 views

Categories:

Keywords:

Natural Language Processing

Online Learning

Machine Learning

artificial intelligence

ACCESS DATASET CITE

Abstract

This benchmark dataset accompanies an article paper titled ``Learning to Reuse Distractors to support Multiple Choice Question Generation in Education''. It contains a test of 298 educational questions covering multiple subjects & languages and a 77K multilingual pool of distractor vocabulary. The goal is for a given question to propose a list of relevant candidate distractors from the pool of distractors.

Instructions:

The dataset is provided in a ZIP file that has two folders. The first folder, 'test-MCQs', contains 6 JSON files, each corresponding to a subject as indicated by its filename namely: English, French, Natural Sciences, History, Biology and Geography. The second folder, `vocab', includes one JSON file containing the list of distractors in a `distractor': `frequency' format.