Datasets
Open Access
Preprocessed CHB-MIT Scalp EEG Database
- Citation Author(s):
- Submitted by:
- Mrs Deepa .B
- Last updated:
- Tue, 07/26/2022 - 10:54
- DOI:
- 10.21227/awcw-mn88
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format. Machine learning and Deep learning models are easily implementable with aid of '.csv' format.
If the dataset is helpful, please site the OpenAccess Paper indicated below. The paper describes the procedure and results in detail.
Deepa, B., & Ramesh, K. (2022). Epileptic seizure detection using deep learning through min max scaler normalization. International Journal of Health Sciences, 6(S1), 10981–10996. https://doi.org/10.53730/ijhs.v6nS1.7801
Procedure in short:
- The tool used for preprocessing is Anaconda-Jupyter Notebook on Intel 8th gen i5 processor with 8GB RAM
- The dataset is prepared by extracting datapoints from '.edf' by using mne package in python. Equal amount of preictal and ictal data are extracted.
- A period of 4096 seconds (68 minutes) each of preictal and ictal data is extracted from the '.edf' files. All ictal periods for 24 patients annotated have been included in the dataset.
- Datapoints are loaded and preprocessed as dataframes by using pandas package in python.
- System RAM size should be available to the maximum possible extent as dataframes are large.
- The file chbmit_preprocessed_data.csv can be used as is for machine learning and deep learning models.
Data Availability :
The datset contains following files.
- chbmit_ictal_raw_data.csv : This file contains only ictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
- chbmit_preictal_raw_data.csv : This file contains only preictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
- chbmit_preictal_23channels_data.csv :This file contains only preictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
- chbmit_ictal_23channels_data.csv :This file contains only ictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
- chbmit_preprocessed_data.csv :This file contains balanced preictal and ictal data from all 24 patients. Only 23 channels are retained, outcome column is added and amount to 24 columns in this file. In outcome column '0' indicates preictal and '1' indicates ictal.
- RECENTLY ADDED
- 24 sheets (Seizures info: patient & file number, start-stop times, datapoints)
- File 278 files (139 preictal+ 139 ictal) ptno_fileno_seizureORnoseizure.csv(Raw data)
This dataset is prepared with data reduction techniques. Data cleaning and data transformation need to be done as suitable for the application or model under development.
Last 2 files can be used for accessing all raw data from 24 patients.
Original Data:
The original raw dataset in '.edf' is available at https://physionet.org/content/chbmit/1.0.0/ and to be cited as
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220
Dataset Files
- 1048575 values x 23 channels chbmit_ictal_23channels_data.csv (243.15 MB)
- 1048575 values x 23 channels chbmit_preictal_23channels_data.csv (225.08 MB)
- 1048575 values x 96 channels with missing data chbmit_ictal_raw_data.csv (1.79 GB)
- 1048575 values x 96 channels with missing data chbmit_preictal_raw_data..csv (1.83 GB)
- 2,097,150 values x 23 channels with outcome column chbmit_preprocessed_data.csv (626.22 MB)
- 24 sheets (Seizures info: patient & file number, start-stop times, datapoints) sizuretimes.xlsx (106.92 kB)
- 278 files (139 preictal+ 139 ictal) ptno_fileno_seizureORnoseizure.csv patientnumber_filenumber_seizureORnoseizure.zip (426.17 MB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
readme_CHBMIT.txt | 1.73 KB |
Comments
Please send me this dataset.
The dataset is for OpenAccess, You can download it by logging into your IEEE-dataport account.
Best regards, good luck :)
is it possible to get sensor locations for use in mne-tools?
Unfortunately, No. https://mne.tools/0.16/manual/io.html#importing-eeg-data. But the 23 channel heads can be read and can be plotted on 10-20 standard EEG placement method. Best Regards,
Hi
Hi,
Can you provide any details about what kind of preprocessing you did on the dataset? Also when you extracted the data using mne package, did you perform any additional preprocessing or just extracted the data itself? -Thanks
What is the preictal period that you have taken for preparing this dataset?
is it possible to get the dataset which are are preprocessed and feature are extracted and marked seizure and non seizure point which we can directly use for epileptic seizure detection through deep learning model
Hello, I would like the data set about patients with epilepsy, please provide me with it for use in my thesis
Hello, I could not download the dataset. Kindly send me the dataset