RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language

Name: RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language
Creator: Valentina Franzoni
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence, Education and Learning Technologies, Machine Learning, Other, Social Sciences, Computational Intelligence, Education

Citation Author(s):: Valentina
Franzoni

Department of Mathematics and Computer Science, University of Perugia, Italy

Giulio
Biondi

Department of Mathematics and Computer Science, University of Perugia, Italy

Alfredo
Milani

Department of Mathematics and Computer Science, University of Perugia, Italy

Valentino
Santucci

Department of International Humanities and Social Sciences, Perugia University for Foreigners, Perugia, Italy
Submitted by:: Valentina Franzoni
Last updated:: Fri, 05/17/2024 - 04:52
DOI:: 10.21227/qyg2-ws92
Data Format:: *.csv (zip);*.xml (zip)
Research Article Link:: RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language
Links:: Classification of Text Writing Proficiency of L2 Learners
Parsing Tools for Italian Phraseological Units
License:: Creative Commons Attribution

425 Views

Categories:: Artificial Intelligence
Education and Learning Technologies
Machine Learning
Other
Social Sciences
Computational Intelligence
Education
Keywords:: artificial intelligence; machine learning; natural language processing; sentiment analysis;

0 ratings - Please login to submit your rating.

ACCESS DATASET CITE

Abstract

RITA (Resource for Italian Tests Assessment), is a new NLP dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level.
RITA dataset is available for automatic processing in CSV and XML format, under an agreement of citation.
In addition to the tests, RITA provides a variety of speech elements, annotations, and statistics, including phraseological units and their syntactic dependencies. The dataset consists of two corpora: one containing the analysis of task assignments, and the other containing analysis of the texts the learners elaborated in response to the assignment.

The work to be cited describes also the data collection and annotation process, structure, and statistics computed to facilitate the analysis of the phraseological text.

The RITA corpus is a collection of data about 3041 exam texts handed in by Italian L2 learners from the B1 to C2 Common European Framework of Reference for Languages (CEFR) levels, collected and transcribed by the Center for Language Evaluation and Certification (CVCL) at the University for Foreigners of Perugia.
RITA is a valuable resource for researchers and educators interested in Italian phraseology, language assessment, and natural language processing.

RITA dataset has been developed under the Italian Ministry of Research under PRIN Project “PHRAME” Grant n.20178XXKFY and directly derived from the CELI Corpus collected in the same PHRAME Project . Information not included in RITA (such as the original raw text) can be obtained by interactively querying the CELI Corpus at https://apps.unistrapg.it/cqpweb/

Instructions:

The dataset consists of five tables for the assignments and six tables for the exams.
For each table in the dataset, the first column represents the attribute name, the second column the attribute description, and the third column foreign key constraints.
Extract the content from the zip file, read the readme.me file and you will find inside the structure of CSV and XML versions.

Please cite:
G. Biondi, V. Franzoni, Y. Li, A. Milani and V. Santucci, "RITA: A Phraseological Dataset of CEFR Assignments and Exams for Italian as a Second Language," 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Venice, Italy, 2023, pp. 425-430, doi: 10.1109/WI-IAT59888.2023.00070.

Funding Agency:

Italian Ministry of Research PRIN

Grant Number:

20178XXKFY

Data Descriptor Article DOI:

https://ieee-dataport.org/10.1109/WI-IAT59888.2023.00070

Comments

An open access version is also available on DOI: 10.5281/zenodo.8313261up-to-date 4 Sept 2023.
This IEEE version can be updated in the future.

Submitted by Valentina Franzoni on Tue, 09/05/2023 - 05:25

Dataset Files

Files have not been uploaded for this dataset

Documentation

Attachment	Size
README.md	3.8 KB

Datasets

Standard Dataset

RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language

Abstract

Comments

More from this Author

Emotional Crowd Sound

Dataset Files

Documentation

QUESTIONS?