Skip to main content

Education

The Numerical Latin Letters (DNLL) dataset consists of Latin numeric letters organized into 26 distinct letter classes, corresponding to the Latin alphabet. Each class within this dataset encompasses multiple letter forms, resulting in a diverse and extensive collection. These letters vary in color, size, writing style, thickness, background, orientation, luminosity, and other attributes, making the dataset highly comprehensive and rich.

Categories:

RITA (Resource for Italian Tests Assessment), is a new NLP dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level.
RITA dataset is available for automatic processing in CSV and XML format, under an agreement of citation.

Categories:

This dataset comprises data created during research on AI-generated code, with a focus on software engineering use-cases. The purpose of the research was to investigate how AI should be integrated into university software engineering curricula.

Categories:

With the development of education informatization and digitalization, Massive Open Online Courses (MOOCs) have been widely adopted in the teaching of higher vocational education due to their advantages such as flexibility of time and place, breaking the area boundary, and realizing the goal of sharing resources. Our data is crawled from 39 courses of higher vocational education on the website of China University MOOC (www.icourse163.org). Our data consists of 40906 reviews that are published between February 3, 2018 and May 2, 2021.

Categories:

The dataset file contains all the relevant data for this paper, including original text data, labels, and statistical information, which is utilized for training, testing, and validation of the proposed models or arguments. Additionally, there is a question bank file that comprises all test questions, filtered test data, and annotated result data after testing. This data is used to evaluate the performance of the models or methods proposed in the paper.

Categories:

The dataset file contains all the relevant data for this paper, including original text data, labels, and statistical information, which is utilized for training, testing, and validation of the proposed models or arguments. Additionally, there is a question bank file that comprises all test questions, filtered test data, and annotated result data after testing. This data is used to evaluate the performance of the models or methods proposed in the paper.

Categories:

RMUTT-DLD is an aggregated collection of data that encompasses details derived from the IC3 digital literacy certification program conducted at Rajamangala University of Technology Thanyaburi (RMUTT) in Thailand spanning from 2016 to 2023. The expanded dataset includes demographic details, academic records, and certification results, offering a holistic perspective on the progression of students' digital literacy over a period of time. The dataset has the flexibility to be imported into diverse applications, enabling its utilization for various purposes.

Categories:

Children Arabic Utterances for Mispronunciation Detection Dataset

Audio samples were recorded from 27 Egyptian children (14 boys and 13 girls aged between 7 and 12 years old), where they pronounce 16 words. The files are organized into folders and subfolders that contain the following: the dataset is managed and separated into 2 folders (Correct / Wrong) pronunciations. The dataset is collected and annotated on segmental pronunciation errors by Arabic linguistics experts from NahdetMisr Publishing House (https://nahdetmisr.com/).

Categories: