Datasets
Standard Dataset
Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals
- Citation Author(s):
- Submitted by:
- Marcin Kolodziej
- Last updated:
- Tue, 02/04/2025 - 04:58
- DOI:
- 10.21227/9hvs-8657
- Data Format:
- License:
Abstract
Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals
This dataset consists of synthetically generated EEG and EMG signals designed for training Convolutional Neural Networks (CNNs) in artifact detection and removal. The dataset includes both clean EEG signals and EEG signals contaminated with simulated EMG artifacts from various sources.
This dataset is useful for training and evaluating machine learning models aimed at artifact correction, signal denoising, and EEG preprocessing.
Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals
Description
This dataset consists of synthetically generated EEG and EMG signals designed for training Convolutional Neural Networks (CNNs) in artifact detection and removal. The dataset includes both clean EEG signals and EEG signals contaminated with simulated EMG artifacts from various sources.
The signals are structured as 80,000 examples, each representing 1 second of data sampled at 256 Hz. The dataset is stored in two files:
- X.mat – Contains EEG signals with artifacts and corresponding EMG artifact sources.
- y.mat – Contains the clean EEG signals (artifact-free).
Data Structure
X (dimensions: 80000 × 256 × 6)
- Dimension 1 (80000) – Number of signal examples (training samples).
- Dimension 2 (256) – Number of samples per signal, corresponding to 1 second of recording at 256 Hz.
- Dimension 3 (6) – Number of signal channels:
- Channel 1: EEG signal contaminated with artifacts.
- Channel 2: Simulated EMG artifact from the Fp1 electrode.
- Channel 3: Simulated EMG artifact from the HEOG electrode.
- Channel 4: Simulated EMG artifact from the Nape electrode.
- Channel 5: Simulated EMG artifact from the Cheek electrode.
- Channel 6: Simulated EMG artifact from the Jaw electrode.
y (dimensions: 80000 × 256)
- Dimension 1 (80000) – Number of signal examples, same as in X.
- Dimension 2 (256) – Number of samples per signal.
- y contains the corresponding clean EEG signal (artifact-free) for each of the 80,000 examples.
This dataset is useful for training and evaluating machine learning models aimed at artifact correction, signal denoising, and EEG preprocessing.