Relation-Associated Instructions & Hallucination Benchmark

Name: Relation-Associated Instructions & Hallucination Benchmark
Creator: Zhiyang Chen
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence

Citation Author(s):: Zhiyang Chen

Yousong Zhu

Yufei Zhan

Zhaowen Li

Chaoyang Zhao

Jinqiao Wang

Ming Tang
Submitted by:: Zhiyang Chen
Last updated:: Mon, 07/08/2024 - 05:14
DOI:: 10.21227/33jh-2m65

161 views

Categories:

Artificial Intelligence

Keywords:

Vision-language model

hallucination

visual instruction tuning

ACCESS DATASET CITE

Abstract

Large vision-language models (LVLMs) suffer from hallucination, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in multi-modal contexts, which can be mainly attributed its training data. The vision instruction dataset primarily focuses on global description that are highly relevant to the image, with few samples containing image details. Therefore, we construct a fine-grained vision instruction dataset, RAI-30k, by generate image-text pairs with detailed relationship annotations in panoptic scene graph dataset (PSG). These conversations pay more attention on detailed facts in the image, encouraging the model to answer questions based on multi-modal contexts. Moreover, to provide a deeper evaluation on the hallucination in LVLMs, we propose a new benchmark, RAH-Bench. It divides vision hallucination into three different types that contradicts the image with wrong categories, attributes or relations, and introduces False Positive Rate as detailed sub-metric for each type. We hope the provided dataset and benchmark will benefit the future research in large vision-language models.

Instructions:

We provide jsonline files for the dataset and the benchmark.
Each line is a sample, containing the image identifier, the question and the answer.
The images are from COCO 2017 dataset.

Enjoy!

Zhiyang Chen Mon, 07/08/2024 - 05:15 Permalink