Datasets
Standard Dataset
IIST BCI DATASET-6 FOR SELECTED COMMON ODIA WORDS
- Citation Author(s):
- Submitted by:
- Shivani Sahoo
- Last updated:
- Tue, 08/20/2024 - 06:37
- DOI:
- 10.21227/wg17-6068
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Brain-Computer Interface (BCI) is a technology that enables direct communication between the brain and external devices, typically by interpreting neural signals. BCI-based solutions for neurodegenerative disorders need datasets with patients’ native languages. However, research in BCI lacks insufficient language-specific datasets, as seen in Odia, spoken by 35-40 million individuals in India. To address this gap, we developed an Electroencephalograph (EEG) based BCI dataset featuring EEG signal samples of commonly spoken Odia words. Using the OpenBCI Cyton device, EEG recordings are collected from a volunteer who speaks Odia language. The dataset is divided into 4 parts: (i) vocal Odia words, (ii) English translations of these Odia words, (iii) sub-vocalization of the Odia words, and (iv) sub-vocalization of English words. The dataset contains information about 100 different words. Each word is recorded with ten trials. By training the dataset using Machine Learning (ML) and Deep Learning (DL) methods, a BCI system can be designed to translate EEG signals into both vocal and subvocal for the Odia and English languages. This can enhance the communication and quality of Odia-speaking patients with neurodegenerative diseases.
The dataset includes files produced by the OpenBCI Cyton Biosensing board.
A. Raw Dataset
RAW dataset is in format of text documents. EEG sample is stored as a file with text values separated by commas and arranged in rows and columns.
Column 1 - sample index is represented
Columns 2 to 9 - EEG recordings from the eight selected channels
Columns 10 to 22 and 24 contain unimportant data
Column 23 - representing time in a raw, unprocessed format.
Column 25 - displays the timestamp in yyyy-mm-dd_hr-min-sec format
B. Processed Dataset
This dataset is in format of .csv files. EEG Channel 0 to EEG Channel 7 columns are considered.
The header lines and unnecessary columns are removed and each csv file is renamed according to the word whose datas are there in that file using Python script (provided).
Note :
In the folder, 1st .txt file corresponds to 1st Odia word, 2nd .txt file corresponds to 2nd Odia word and so on. The list of Odia words are provided.
For more details go through the Data Descriptor Paper.