hallucination

We develope a novel TCM hallucination detection dataset, Hallu-TCM, sine no prior work has attempted this task in TM. We selected 1,260 TCM exam questions including 16 TCM subjects, input them into GPT-4, and collected their feedback. In the first level, we utilize Qwen-Max interface to annotate feedback multiple times with the binary label. If Qwen-Max consistently provided the same label across annotations, we adopted that label. For contentious cases, we recruited higher-degree research students who can understand and solve complex questions, including three Ph.D.

Categories:
110 Views

Large vision-language models (LVLMs) suffer from hallucination, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in multi-modal contexts, which can be mainly attributed its training data. The vision instruction dataset primarily focuses on global description that are highly relevant to the image, with few samples containing image details.

Categories:
118 Views