3D Dataset; Pattern Recognition; Machine Learning; Computer Vision
Noise recognition plays an essential role in human-computer interaction and various technological applications. However, identifying individual speakers remains a significant challenge, especially in diverse and acoustically challenging environments. This paper presents the Enhanced Multi-Layer Convolutional Neural Network (EML-CNN), a novel approach to improve automated speaker recognition from audio speech. The EML-CNN architecture features multiple convolutional layers and a dense block, finely tuned to extract unique voice signatures from English speech samples.
- Categories:
The Numerical Latin Letters (DNLL) dataset consists of Latin numeric letters organized into 26 distinct letter classes, corresponding to the Latin alphabet. Each class within this dataset encompasses multiple letter forms, resulting in a diverse and extensive collection. These letters vary in color, size, writing style, thickness, background, orientation, luminosity, and other attributes, making the dataset highly comprehensive and rich.
- Categories:
Our video action dataset is generated using a 3D simulation program developed in Unity. Each data sample consists of a video capturing a human performing various actions. Our initial set of actions comprises a total of 10 different yoga poses: camel, chair, child's pose, lord of the dance, lotus, thunderbolt, triangle, upward dog, warrior II, and warrior III. Within each of these 10 yoga poses, there are four variations, some exhibiting more pronounced differences than others. This results in a total of 40 action types within our dataset.
- Categories: