Computational Linguistics

A fact-checking dataset focused exclusively on quantitative claims. It includes 33,422 fact-checked claims featuring comparative, statistical, interval, and temporal entities. Each claim is accompanied by detailed metadata and supporting evidence, providing a robust foundation for automated verification. This dataset contains claims and their corresponding fact-checking details. It is provided in JSON format, with each entry containing information about a claim, its processed version, fact-checking results, and relevant metadata.

 

Categories:
12 Views

Punjabi Shahmukhi Alphabet dataset for machine learning projects SMDB V2

Categories:
237 Views

LATIC is focusing on non-native Mandarin Chinese learners. It is an annotated non-native speech database for Chinese, which is fully open-source can get online for any purpose use. The related using area can be automatic speech scoring, evaluation, derivation—L2 teaching, Education of Chinese as Foreign Language, etc. We are aiming to provide a relatively small-scale and highly efficient training deviation dataset. For this target, four chosen non-native Chinese speaker participated in this project, and their mother tongue (L1s) varies from Russian, Korean, French and Arabic.

Categories:
1770 Views