Skip to main content

Datasets

Standard Dataset

Commonsense Reasoning Question Generation and Its Applications

Citation Author(s):
Jianxing Yu (Sun Yat-sen University)
Submitted by:
Jianxing Yu
Last updated:
DOI:
10.21227/vvbh-jm94
Data Format:
No Ratings Yet

Abstract

Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.

## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.

 

Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.

Instructions:

Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.

## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.

 

Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.

Dataset Files

Files have not been uploaded for this dataset