Commonsense Reasoning Question Generation and Its Applications

Citation Author(s):
Jianxing
Yu
Sun Yat-sen University
Submitted by:
Jianxing Yu
Last updated:
Sat, 07/01/2023 - 12:58
DOI:
10.21227/vvbh-jm94
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.

## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.

 

Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.

Instructions: 

Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.

## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](https://leaderboard.allenai.org/cosmosqa/submissions/get-started), please follow the data format described on the that page. The prediction file should contain one label per line.

 

Since the tasks of QG and QA are complementary, we conducted experiments on two typical data sets in the field of commonsense reasoning QA, including \emph{Cosmos QA}~\cite{DBLP:conf/emnlp/HuangBBC19} and \emph{MCScript 2.0}~\cite{DBLP:conf/lrec/0002MRTP18}. These data sets were split as train/dev/test sets with the size of 25.6k/3k/7k and 14.2k/2.0k/3.6k samples, respectively. The samples mostly required multi-hop reasoning over complex context and commonsense understanding. They were more suitable than other data sets like \emph{CommonsenseQA}~\cite{DBLP:conf/naacl/TalmorHLB19} which provided no text context, \emph{SQuAD}~\cite{DBLP:conf/emnlp/RajpurkarZLL16} did not need multi-hop deduction, and \emph{LogiQA}~\cite{DBLP:conf/ijcai/LiuCLHWZ20} with general questions such as ``\emph{Which one is true?}'' that can be yielded by rules.