IIST BCI Dataset-1 for Selected Common Malayalam Words

Citation Author(s):
Parvathi
Nair
Indian Institute of Technology Roorkee
Parvathy
S S
A J College of Science and Technology, Thonnakkal
Nancy
Sunil
A J College of Science and Technology, Thonnakkal
Anurag
Mukati
Indore Institute of Science and Technology, Indore
Charu
Chauhan
Indian Institute of Space Science and Technology (IIST), Trivandrum
S
Sumitra
Indian Institute of Space Science and Technology (IIST), Trivandrum
B S
Manoj
Indian Institute of Space Science and Technology (IIST), Trivandrum
Submitted by:
Parvathi Nair
Last updated:
Tue, 01/09/2024 - 04:12
DOI:
10.21227/83t5-tw46
Data Format:
License:
5
1 rating - Please login to submit your rating.

Abstract 

In today’s context, it is essential to develop technologies to help older patients with neurocognitive disorders communicate better with their caregivers. Research in Brain Computer Interface, especially in thought-to-text translation has been carried out in several languages like Chinese, Japanese and others. However, research of this nature has been hindered in India due to scarcity of datasets in vernacular languages, including Malayalam. Malayalam is a South Indian language, spoken primarily in the state of Kerala by bout 34 million people. This is what motivated us to generate our own Malayalam dataset, potentially leading to further advancements in Malayalam BCI research.

Instructions: 

The dataset consists of files generated using OpenBCI Cyton board. Each EEG sample is a tsv/csv file with rows and columns. The dataset contains 30 folders, each representing an EEG data collection session. Out of the 30 folders, 10 are for vocal Malayalam, 10 are for vocal English and 10 are for subvocal Malayalam. Each folder contains 26 files, each representing the EEG recordings of a word.

The 30 folders are named in the following format:

OpenBCISession_YYYY-MM-DD_HH-MM-SS- language_trial_no

File structure:

Each file consists of 25 columns.

Column 1 : Sample index

Column 2-9 : EXG channel sgnal

Column 10-12 : Accelerometer readings (not relevant)

Column 13-22, 24 : Other (not relevant)

Column 23 : Unix timestamp

Column 25 : Formatted timestamp

Data Descriptor Article DOI: 

Comments

This dataset consists of EEG recordings of 26 Malayalam(vocal and subvocal) and corresponding English words recorded using OpenBCI device. 10 trials were taken for each word.

Submitted by Parvathi Nair on Tue, 01/09/2024 - 04:20