RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language

Citation Author(s):
Valentina
Franzoni
Department of Mathematics and Computer Science, University of Perugia, Italy
Giulio
Biondi
Department of Mathematics and Computer Science, University of Perugia, Italy
Alfredo
Milani
Department of Mathematics and Computer Science, University of Perugia, Italy
Valentino
Santucci
Department of International Humanities and Social Sciences, Perugia University for Foreigners, Perugia, Italy
Submitted by:
Valentina Franzoni
Last updated:
Mon, 09/04/2023 - 12:32
DOI:
10.21227/qyg2-ws92
Data Format:
Link to Paper:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

RITA (Resource for Italian Tests Assessment), is a new dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level.
In addition to the tests, RITA provides a variety of speech elements, annotations, and statistics, including phraseological units and their syntactic dependencies. The dataset consists of two corpora: one containing the task assignment and the other containing the texts elaborated by the learners in response to the assignment. This work describes the
data collection and annotation process, structure, and statistics computed to facilitate the analysis of the phraseological text.
The RITA corpus is a collection of 3041 exam texts handed in by Italian L2 learners from the B1 to C2 Common European Framework of Reference for Languages (CEFR) levels, collected and transcribed by the Center for Language Evaluation and Certification (CVCL) at the University for Foreigners of Perugia.
RITA is a valuable resource for researchers and educators interested in Italian phraseology, language assessment, and natural language processing. Funded by Italian Ministry of Research PRIN Project “PHRAME” Grant n.20178XXKFY.

Instructions: 

The dataset consists of five tables for the assignments and six tables for the exams.
For each table in the dataset, the first column represents the attribute name, the second column the attribute description, and the third column foreign key constraints.
Extract the content from the zip file, read the readme.me file and you will find inside the structure of CSV and XML versions.

Funding Agency: 
Italian Ministry of Research PRIN
Grant Number: 
20178XXKFY

Comments

An open access version is also available on DOI: 10.5281/zenodo.8313261up-to-date 4 Sept 2023.
This IEEE version can be updated in the future.

Submitted by Valentina Franzoni on Tue, 09/05/2023 - 05:25