Preprocessed CHB-MIT Scalp EEG Database
Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format. Machine learning and Deep learning models are easily implementable with aid of '.csv' format.
- The tool used for preprocessing is Anaconda-Jupyter Notebook on Intel 8th gen i5 processor with 8GB RAM
- The dataset is prepared by extracting datapoints from '.edf' by using mne package in python. Equal amount of preictal and ictal data are extracted.
- A period of 4096 seconds (68 minutes) each of preictal and ictal data is extracted from the '.edf' files. All ictal periods for 24 patients annotated have been included in the dataset.
- Datapoints are loaded and preprocessed as dataframes by using pandas package in python.
- System RAM size should be available to the maximum possible extent as dataframes are large.
- The file chbmit_preprocessed_data.csv can be used as is for machine learning and deep learning models.
Data Availability :
The datset contains following files.
- chbmit_ictal_raw_data.csv : This file contains only ictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
- chbmit_preictal_raw_data.csv : This file contains only preictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
- chbmit_preictal_23channels_data.csv :This file contains only preictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
- chbmit_ictal_23channels_data.csv :This file contains only ictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
- chbmit_preprocessed_data.csv :This file contains balanced preictal and ictal data from all 24 patients. Only 23 channels are retained, outcome column is added and amount to 24 columns in this file. In outcome column '0' indicates preictal and '1' indicates ictal.
This dataset is prepared with data reduction techniques. Data cleaning and data transformation need to be done as suitable for the application or model under development.
The original raw dataset in '.edf' is available at https://physionet.org/content/chbmit/1.0.0/ and to be cited as
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220
- 1048575 values x 23 channels chbmit_ictal_23channels_data.csv (243.15 MB)
- 1048575 values x 23 channels chbmit_preictal_23channels_data.csv (225.08 MB)
- 1048575 values x 96 channels with missing data chbmit_ictal_raw_data.csv (1.79 GB)
- 1048575 values x 96 channels with missing data chbmit_preictal_raw_data..csv (1.83 GB)
- 2,097,150 values x 23 channels with outcome column chbmit_preprocessed_data.csv (626.22 MB)