IIST BCI Dataset-10 for Telugu Vowels and Consonants

- Citation Author(s):
-
Sumitra S (Indian Institute of Space Science and Technology)
- Submitted by:
- likhith boddapu
- Last updated:
- DOI:
- 10.21227/32qq-c421
- Data Format:
- Categories:
- Keywords:
Abstract
Brain-Computer Interface (BCI) technology makes possible a direct interface between the brain and external devices through the interpretation of neural signals. It is essential to have patient's native language-containing datasets when designing BCI-based solutions for neurological disorders. Current BCI research, though, lacks language-specific datasets, notably for languages like Telugu, which has over 90 million speakers in India. We developed an Electroencephalograph (EEG)-based Brain-Computer Interface (BCI) dataset consisting of EEG signal samples for Telugu Vowels and Consonants. The dataset was collected using the OpenBCI Cyton device that captured EEG data from three Telugu native speakers. The dataset is divided into four sections.
1. Vocalized Telugu Vowels and Consonants.
2. Subvocalization of Telugu Vowels and Consonants.
The Telugu Vowels and Consonants dataset, which was recorded for twenty trials for three volunteer speakers. Based on this dataset, a BCI system to convert EEG signals into both subvocal and vocal forms can be developed to support Telugu language by training this dataset with Machine Learning (ML) and Deep Learning (DL) methods.
Instructions:
The dataset consists of EEG samples that were recorded from three telugu speaking volunteers. The samples are kept in text files and are saved in comma-separated value (CSV) format.
Every row in the dataset is a different EEG sample, with the following format:
Column 1: Sample Index - This column holds a unique number for each sample.
Columns 2-9: EEG Records - These columns contain information from eight different EEG channels, recording electrical activity from various regions of the brain.
Columns 10-22 and 24: Extra Information - These columns can hold ancillary information, with different levels of significance according to the given use case.
Column 23: Raw Time Data - This column typically has time data in raw form that might have to be formatted or adjusted for the purpose of analysis.
Column 25: Time Stamps - This column supplies precise temporal information for every sample, in the format "Year-Month-Day Hour:Minute."
These timestamps are essential for aligning the EEG data with external events or other recordings.
Besides the EEG data, the text documents may contain metadata or other ancillary information that can be used to add context or provide additional insights into the dataset. This metadata might comprise demographic data on the volunteers, including gender, age, or other relevant information, which may be helpful for additional analysis or interpretation.