Datasets
Standard Dataset
IIST BCI Dataset-8 for Selected Common Telugu Words of Male and Female Speakers
- Citation Author(s):
- Submitted by:
- likhith boddapu
- Last updated:
- Thu, 09/05/2024 - 15:02
- DOI:
- 10.21227/1xfr-y802
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Brain-Computer Interface (BCI) technology facilitates a direct connection between the brain and external devices by interpreting neural signals. It is critical to have datasets that contain patient's native languages while developing BCI-based solutions for neurological disorders. However, present BCI research lacks appropriate language-specific datasets, particularly for languages such as Telugu, which is spoken by more than 90 million people in India. We created an Electroencephalograph (EEG)-based BCI dataset containing EEG signal samples corresponding to widely spoken Telugu words for both female and male speakers. The dataset was developed using the OpenBCI Cyton device, which recorded EEG data from two Telugu-speaking participants. The dataset is broken into four parts.
1. Vocalized Telugu words.
2. English translations of Telugu words.
3. Subvocalization of Telugu words.
4. Subvocalization of English words.
The dataset includes 100 different words, each recorded for ten trials for a male and female speakers. Using this dataset, a BCI system capable of translating EEG signals into both vocal and subvocal forms for Telugu and English languages can be created by training this dataset using Machine Learning (ML) and Deep Learning (DL) approaches.