Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals

Citation Author(s):
Marcin
Jurczak
Warsaw University of Technology
Marcin
Kołodziej
Warsaw University of Technology
Andrzej
Majkowski
Warsaw University of Technology
Submitted by:
Marcin Kolodziej
Last updated:
Tue, 02/04/2025 - 04:58
DOI:
10.21227/9hvs-8657
Data Format:
License:
22 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals

This dataset consists of synthetically generated EEG and EMG signals designed for training Convolutional Neural Networks (CNNs) in artifact detection and removal. The dataset includes both clean EEG signals and EEG signals contaminated with simulated EMG artifacts from various sources.

This dataset is useful for training and evaluating machine learning models aimed at artifact correction, signal denoising, and EEG preprocessing.

Instructions: 

Synthetic EEG Dataset for CNN Training: Clean and Artifact-Contaminated Signals

Description

This dataset consists of synthetically generated EEG and EMG signals designed for training Convolutional Neural Networks (CNNs) in artifact detection and removal. The dataset includes both clean EEG signals and EEG signals contaminated with simulated EMG artifacts from various sources.

The signals are structured as 80,000 examples, each representing 1 second of data sampled at 256 Hz. The dataset is stored in two files:

  • X.mat – Contains EEG signals with artifacts and corresponding EMG artifact sources.
  • y.mat – Contains the clean EEG signals (artifact-free).

Data Structure

X (dimensions: 80000 × 256 × 6)

  • Dimension 1 (80000) – Number of signal examples (training samples).
  • Dimension 2 (256) – Number of samples per signal, corresponding to 1 second of recording at 256 Hz.
  • Dimension 3 (6) – Number of signal channels:
    • Channel 1: EEG signal contaminated with artifacts.
    • Channel 2: Simulated EMG artifact from the Fp1 electrode.
    • Channel 3: Simulated EMG artifact from the HEOG electrode.
    • Channel 4: Simulated EMG artifact from the Nape electrode.
    • Channel 5: Simulated EMG artifact from the Cheek electrode.
    • Channel 6: Simulated EMG artifact from the Jaw electrode.

y (dimensions: 80000 × 256)

  • Dimension 1 (80000) – Number of signal examples, same as in X.
  • Dimension 2 (256) – Number of samples per signal.
  • y contains the corresponding clean EEG signal (artifact-free) for each of the 80,000 examples.

This dataset is useful for training and evaluating machine learning models aimed at artifact correction, signal denoising, and EEG preprocessing.

 

 

 

Dataset Files

    Files have not been uploaded for this dataset