Datasets
Standard Dataset
Hallu-TCM
- Citation Author(s):
- Submitted by:
- Wenjing Yue
- Last updated:
- Sun, 10/13/2024 - 01:25
- DOI:
- 10.21227/cq97-4g34
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
We develope a novel TCM hallucination detection dataset, Hallu-TCM, sine no prior work has attempted this task in TM. We selected 1,260 TCM exam questions including 16 TCM subjects, input them into GPT-4, and collected their feedback. In the first level, we utilize Qwen-Max interface to annotate feedback multiple times with the binary label. If Qwen-Max consistently provided the same label across annotations, we adopted that label. For contentious cases, we recruited higher-degree research students who can understand and solve complex questions, including three Ph.D. students and one master's student. They can perform the secondary annotation by using any available tools to assist them. Finally, one TCM physician annotated any controversial feedback from the students' annotations to make the final decision.
"example": [
{
"index": 0,
"analysis": "standard analysis"
"response": "Feedback generated by LLMs"
"query": "Question text",
"answer": "Answer text",
"answer_list": "Option list, split by \n",
"KN": "TCM subject",
"question_type": "A12",
"lime_fragment": "Feedback need to process by the Important Token Extraction module",
"tcm_master_explanation": "TCM keywords provided by the Important Token Extraction module",
"master_explanation": "General keywords provided by the Important Token Extraction module",
"fact_list": [
"Need to generate atomic fact list elements based on 'lime_fragment'",
...
""
],
"evidence_list": [
"reference fragments",
...,
""
],
"title": [
"title of reference fragments",
...,
""
],
"label": 0/1
},
...,
]
label description:
0: True
1: False
Documentation
Attachment | Size |
---|---|
readme.doc | 974 bytes |