Datasets
Standard Dataset
QLAIM Dataset
- Citation Author(s):
- Submitted by:
- Megha Sundriyal
- Last updated:
- Sun, 01/12/2025 - 06:15
- DOI:
- 10.21227/x8d9-ce21
- Data Format:
- License:
11 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
A fact-checking dataset focused exclusively on quantitative claims. It includes 33,422 fact-checked claims featuring comparative, statistical, interval, and temporal entities. Each claim is accompanied by detailed metadata and supporting evidence, providing a robust foundation for automated verification. This dataset contains claims and their corresponding fact-checking details. It is provided in JSON format, with each entry containing information about a claim, its processed version, fact-checking results, and relevant metadata.
Instructions:
The data is structured in JSON format.
Each record contains the following fields:
- Original Claim: The raw, unprocessed version of the claim (contains special characters like
\u201c
). - Processed Claim: The cleaned and formatted version of the original claim.
- Fact-check Link: URL to the fact-checking page that evaluates the claim.
- Publisher Name: The name of the publisher or fact-checking organization (e.g., "PolitiFact").
- Claim Date: The date when the claim was made.
- Published Date: The date when the fact-check result was published (maybe
null
). - Verification Lag: The time difference between the claim date and the fact-check publication (maybe
null
). - Language: The language of the claim (e.g., "en" for English).
- Regex Explanation: List of patterns or explanations related to regex used in processing (currently empty:
[]
). - NER Explanation: Named entities (NER) identified in the claim (e.g., "overnight").
- Confidence Score: A score representing the model's confidence in the fact-check result (from 0 to 1).
- Fact-check Rating: The outcome of the fact-checking process (e.g., "False")
- id: A unique identifier for each claim.