Datasets
Standard Dataset
LMS Moodle Student Log Traces from Blended Computer Science Course (Winter Semester 2023)
- Citation Author(s):
- Submitted by:
- Janka Pecuchova
- Last updated:
- Fri, 07/19/2024 - 08:56
- DOI:
- 10.21227/rp92-fz27
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The dataset was gathered from a virtual learning environment course at Constantine the Philosopher University in Nitra. It includes online activity logs of 152 university students enrolled in a blended Computer Science course during the winter semester from September 25, 2023, to December 21, 2023. This course combined traditional lectures and lab sessions with online interactions and digital access to course materials via the Learning Management System Moodle platform.
Students accessed digital course materials, engaged in collaborative projects, submitted assignments, participated in quizzes, and took active responsibility for their learning. Moodle's logging capabilities recorded each student's interactions in detail, including student identities, IP addresses for each session, precise timestamps of activities, and descriptions of actions taken. These log records amounted to 150,883 entries over the 13-week academic semester, providing a comprehensive dataset for analysis.
This dataset facilitates the analysis of student behavior and interaction with the course, enabling researchers to uncover patterns in online engagement and assess the impact of different activities on student performance. This analysis offers valuable insights into the effectiveness of blended learning environments.
Data pre-processing is crucial to ensure the integrity and suitability of the dataset for subsequent clustering algorithms. The steps required to work with the dataset contain:
Data Transformation: converting all categorical variables to numeric values followed by data normalization or scaling to ensure uniformity in measurement scales.
Data Cleaning: removing or correcting erroneous data entries, handling missing values, and eliminating outliers.
Attribute Selection: identifying and retaining the most significant attributes contributing to the analysis while discarding redundant or irrelevant ones.
Dimensionality Reduction: simplifying the input dataset by reducing the number of dimensions without losing important information.
These stages play an essential role in refining the dataset, enhancing its structure and quality, and ensuring that it is optimally prepared for effective analysis using clustering algorithms.
Most machine learning algorithms generally require numeric input and output variables. This implies that all characteristics, including categories or nominal variables, must be transformed into numeric variables before entering data into the analysis. In the process of data transformation, the variable Component was mapped to the numerical values: Assignment: 0, Attendance: 1, Course: 2, Project: 3, Quiz: 4, Study_material: 5. Similarly, Action and Target were also converted into numerical representations using predefined mappings. Several additional time-related attributes were extracted from the timestamp attribute to provide deeper insights into the temporal patterns of student interactions. These included weekday, week, and month. Ensuring the privacy of student data was important. Hence, each student’s data was anonymized. This step was critical to maintaining confidentiality and compliance with data protection regulations while still allowing individual tracking across the dataset. The attributes IP address and description were removed from the dataset. The rationale for omitting the IP address was that it did not contribute to the academic analysis; its primary function of identifying users, locations, or devices was not relevant to our objectives focused on educational outcomes. Similarly, while providing narrative context, the description field did not offer quantifiable data for clustering or pattern recognition and was thus deemed non-essential for the analytical processes. These pre-processing steps ensured the dataset was clean, well-structured, and aligned with the study’s objectives to explore the impact of various educational engagements on student performance.
The next step of data preparation focused on checks for missing values and outliers within the log files. Every entry in the dataset was checked, ensuring completeness and consistency across all data points. This examination confirmed the absence of missing entries, which was crucial for maintaining the integrity of subsequent analyses. The outlier analysis indicated that no extreme outliers were present, affirming the robustness of the dataset for detailed analysis. These steps were vital for ensuring that the dataset was complete and representative.
Comments
Have you also published any paper based on this dataset?