Computational Linguistics

QLAIM Dataset

A fact-checking dataset focused exclusively on quantitative claims. It includes 33,422 fact-checked claims featuring comparative, statistical, interval, and temporal entities. Each claim is accompanied by detailed metadata and supporting evidence, providing a robust foundation for automated verification. This dataset contains claims and their corresponding fact-checking details. It is provided in JSON format, with each entry containing information about a claim, its processed version, fact-checking results, and relevant metadata.

Categories:: Artificial Intelligence

35 Views

Shahmukhi Database SMDB- SMHaroof V2

Punjabi Shahmukhi Alphabet dataset for machine learning projects SMDB V2

Categories:: Artificial Intelligence
Machine Learning

239 Views

LATIC: A Non-native Pre-labelled Mandarin Chinese Validation Corpus for Automatic Speech Scoring and Evaluation Task

LATIC is focusing on non-native Mandarin Chinese learners. It is an annotated non-native speech database for Chinese, which is fully open-source can get online for any purpose use. The related using area can be automatic speech scoring, evaluation, derivation—L2 teaching, Education of Chinese as Foreign Language, etc. We are aiming to provide a relatively small-scale and highly efficient training deviation dataset. For this target, four chosen non-native Chinese speaker participated in this project, and their mother tongue (L1s) varies from Russian, Korean, French and Arabic.

Categories:: Artificial Intelligence
Machine Learning
Communications

1916 Views