Robust VQA (RVQA) Dataset with Language Prior and Compositional Reasoning Labels

Citation Author(s):
Souvik
Chowdhury
NIT Silchar
Badal
Soni
NIT Silchar
Submitted by:
SOUVIK CHOWDHURY
Last updated:
Wed, 11/27/2024 - 01:07
DOI:
10.21227/9cjm-dx19
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset is designed to advance research in Visual Question Answering (VQA), specifically addressing challenges related to language priors and compositional reasoning. It incorporates question labels categorizing queries based on their susceptibility to either issue, allowing for targeted evaluation of VQA models. The dataset consists of 33,051 training images and 14,165 validation images, along with 571,244 training questions and 245,087 validation questions. Among the training questions, 313,664 focus on compositional reasoning, while 257,580 pertain to language prior. Similarly, the validation questions are categorized into 134,313 for compositional reasoning and 110,774 for language prior. This dataset serves as a benchmarking tool for evaluating models' performance across these two challenges, providing insights into areas that require further improvement. The comprehensive dataset preparation process, including image collection, caption generation, prompt creation, QA pair generation, and quality control, is outlined in the accompanying algorithm. The dataset is designed to be extensible to other image sources and can be a valuable resource for researchers focusing on VQA tasks involving complex reasoning.

Instructions: 

The file contains two parts.

The zip contains all the images related to training and validation dataset.

The zip file contains train.json => This contains both question-answer pair.

The zip file contains valid.json => This contains only question.

For getting the access to answers of validation dataset (valid.json file), please send an email request to souvikcho@gmail.com / badal@cse.nits.ac.in for access.

Funding Agency: 
No Funding Agency