QLAIM Dataset

Citation Author(s):
Megha
Sundriyal
Submitted by:
Megha Sundriyal
Last updated:
Sun, 01/12/2025 - 06:15
DOI:
10.21227/x8d9-ce21
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

A fact-checking dataset focused exclusively on quantitative claims. It includes 33,422 fact-checked claims featuring comparative, statistical, interval, and temporal entities. Each claim is accompanied by detailed metadata and supporting evidence, providing a robust foundation for automated verification. This dataset contains claims and their corresponding fact-checking details. It is provided in JSON format, with each entry containing information about a claim, its processed version, fact-checking results, and relevant metadata.

 

Instructions: 

The data is structured in JSON format.

Each record contains the following fields:

  • Original Claim: The raw, unprocessed version of the claim (contains special characters like \u201c).
  • Processed Claim: The cleaned and formatted version of the original claim.
  • Fact-check Link: URL to the fact-checking page that evaluates the claim.
  • Publisher Name: The name of the publisher or fact-checking organization (e.g., "PolitiFact").
  • Claim Date: The date when the claim was made.
  • Published Date: The date when the fact-check result was published (maybe null).
  • Verification Lag: The time difference between the claim date and the fact-check publication (maybe null).
  • Language: The language of the claim (e.g., "en" for English).
  • Regex Explanation: List of patterns or explanations related to regex used in processing (currently empty: []).
  • NER Explanation: Named entities (NER) identified in the claim (e.g., "overnight").
  • Confidence Score: A score representing the model's confidence in the fact-check result (from 0 to 1).
  • Fact-check Rating: The outcome of the fact-checking process (e.g., "False")
  • id: A unique identifier for each claim.