Datasets
Standard Dataset
IIST BCI Dataset-1 for Selected Common Malayalam Words
- Citation Author(s):
- Submitted by:
- Parvathi Nair
- Last updated:
- Tue, 01/09/2024 - 04:12
- DOI:
- 10.21227/83t5-tw46
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
In today’s context, it is essential to develop technologies to help older patients with neurocognitive disorders communicate better with their caregivers. Research in Brain Computer Interface, especially in thought-to-text translation has been carried out in several languages like Chinese, Japanese and others. However, research of this nature has been hindered in India due to scarcity of datasets in vernacular languages, including Malayalam. Malayalam is a South Indian language, spoken primarily in the state of Kerala by bout 34 million people. This is what motivated us to generate our own Malayalam dataset, potentially leading to further advancements in Malayalam BCI research.
The dataset consists of files generated using OpenBCI Cyton board. Each EEG sample is a tsv/csv file with rows and columns. The dataset contains 30 folders, each representing an EEG data collection session. Out of the 30 folders, 10 are for vocal Malayalam, 10 are for vocal English and 10 are for subvocal Malayalam. Each folder contains 26 files, each representing the EEG recordings of a word.
The 30 folders are named in the following format:
OpenBCISession_YYYY-MM-DD_HH-MM-SS- language_trial_no
File structure:
Each file consists of 25 columns.
Column 1 : Sample index
Column 2-9 : EXG channel sgnal
Column 10-12 : Accelerometer readings (not relevant)
Column 13-22, 24 : Other (not relevant)
Column 23 : Unix timestamp
Column 25 : Formatted timestamp
Comments
This dataset consists of EEG recordings of 26 Malayalam(vocal and subvocal) and corresponding English words recorded using OpenBCI device. 10 trials were taken for each word.