Datasets
Standard Dataset
Commonsense Reasoning Question Generation and Its Applications
- Citation Author(s):
- Submitted by:
- Jianxing Yu
- Last updated:
- Sat, 07/01/2023 - 12:58
- DOI:
- 10.21227/vvbh-jm94
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.
## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.
Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.
Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.
## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.
Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.
Dataset Files
- dataset_MCScript-2.0.zip (7.40 MB)
- dataset_CosmosQA.zip (7.78 MB)