IIST BCI Dataset-1 for Selected Common Malayalam Words

Name: IIST BCI Dataset-1 for Selected Common Malayalam Words
Creator: Parvathi Nair
License: https://creativecommons.org/licenses/by/4.0/

Citation Author(s):: Parvathi Nair (Indian Institute of Technology Roorkee)

Parvathy S S (A J College of Science and Technology, Thonnakkal)

Nancy Sunil (A J College of Science and Technology, Thonnakkal)

Anurag Mukati (Indore Institute of Science and Technology, Indore)

Charu Chauhan (Indian Institute of Space Science and Technology (IIST), Trivandrum)

S Sumitra (Indian Institute of Space Science and Technology (IIST), Trivandrum)

B S Manoj (Indian Institute of Space Science and Technology (IIST), Trivandrum)
Submitted by:: Parvathi Nair
Last updated:: Tue, 01/09/2024 - 09:12
DOI:: 10.21227/83t5-tw46
Data Format:: .txt

939 views

Categories:

Keywords:

BCI

Malayalam dataset

International 10-20 system

OpenBCI

ACCESS DATASET CITE

Abstract

In today’s context, it is essential to develop technologies to help older patients with neurocognitive disorders communicate better with their caregivers. Research in Brain Computer Interface, especially in thought-to-text translation has been carried out in several languages like Chinese, Japanese and others. However, research of this nature has been hindered in India due to scarcity of datasets in vernacular languages, including Malayalam. Malayalam is a South Indian language, spoken primarily in the state of Kerala by bout 34 million people. This is what motivated us to generate our own Malayalam dataset, potentially leading to further advancements in Malayalam BCI research.

Instructions:

The dataset consists of files generated using OpenBCI Cyton board. Each EEG sample is a tsv/csv file with rows and columns. The dataset contains 30 folders, each representing an EEG data collection session. Out of the 30 folders, 10 are for vocal Malayalam, 10 are for vocal English and 10 are for subvocal Malayalam. Each folder contains 26 files, each representing the EEG recordings of a word.

The 30 folders are named in the following format:

OpenBCISession_YYYY-MM-DD_HH-MM-SS- language_trial_no

File structure:

Each file consists of 25 columns.

Column 1 : Sample index

Column 2-9 : EXG channel sgnal

Column 10-12 : Accelerometer readings (not relevant)

Column 13-22, 24 : Other (not relevant)

Column 23 : Unix timestamp

Column 25 : Formatted timestamp

This dataset consists of EEG recordings of 26 Malayalam(vocal and subvocal) and corresponding English words recorded using OpenBCI device. 10 trials were taken for each word.

Parvathi Nair Tue, 01/09/2024 - 09:20 Permalink