QLAIM Dataset

Citation Author(s):: Megha Sundriyal
Submitted by:: Megha Sundriyal
Last updated:: Sun, 01/12/2025 - 11:15
DOI:: 10.21227/x8d9-ce21
Data Format:: *.csv; *.json;

34 views

Categories:

Artificial Intelligence

Keywords:

Fact-Checking

Computational Linguistics

Social Media

ACCESS DATASET CITE

Abstract

A fact-checking dataset focused exclusively on quantitative claims. It includes 33,422 fact-checked claims featuring comparative, statistical, interval, and temporal entities. Each claim is accompanied by detailed metadata and supporting evidence, providing a robust foundation for automated verification. This dataset contains claims and their corresponding fact-checking details. It is provided in JSON format, with each entry containing information about a claim, its processed version, fact-checking results, and relevant metadata.

Instructions:

The data is structured in JSON format.

Each record contains the following fields:

Original Claim: The raw, unprocessed version of the claim (contains special characters like \u201c).
Processed Claim: The cleaned and formatted version of the original claim.
Fact-check Link: URL to the fact-checking page that evaluates the claim.
Publisher Name: The name of the publisher or fact-checking organization (e.g., "PolitiFact").
Claim Date: The date when the claim was made.
Published Date: The date when the fact-check result was published (maybe null).
Verification Lag: The time difference between the claim date and the fact-check publication (maybe null).
Language: The language of the claim (e.g., "en" for English).
Regex Explanation: List of patterns or explanations related to regex used in processing (currently empty: []).
NER Explanation: Named entities (NER) identified in the claim (e.g., "overnight").
Confidence Score: A score representing the model's confidence in the fact-check result (from 0 to 1).
Fact-check Rating: The outcome of the fact-checking process (e.g., "False")
id: A unique identifier for each claim.