Simulated Student Dataset for Fairness Analysis in Predicted Grading Models

Citation Author(s):
Mounia
Drissi
Submitted by:
Mounia Drissi
Last updated:
Fri, 04/11/2025 - 04:51
DOI:
10.21227/bmy1-mw34
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset contains simulated records for 3,000 students, generated for the purpose of evaluating fairness in predicted grading models. The dataset includes decile rankings based on historical performance, predicted grades, and demographic attributes such as socioeconomic status, school type, gender, and ethnicity. The data was created using controlled randomization techniques and includes noise to reflect real-world prediction uncertainty. While entirely synthetic, the dataset is designed to mimic key structural patterns relevant to algorithmic fairness and educational inequality. It may be used to test grading algorithms, simulate bias detection, or serve as a reproducible example in education-focused machine learning research.

Instructions: 

The dataset can be downloaded as a CSV file and opened in any spreadsheet application (e.g., Excel) or analyzed using statistical software (e.g., R, Python, SPSS). Each row represents a simulated student record with associated demographic and predicted grade information. For variable definitions and generation logic, please refer to the README file provided. The dataset is suitable for fairness analysis, regression modeling, and reproducibility exercises in educational data science.

Documentation

AttachmentSize
File README.txt1022 bytes