WAV

BC-AC Multimodal Emotional Speech Dataset (Partial Release)

To support research on multimodal speech emotion recognition (SER), we developed a dual-channel emotional speech database featuring synchronized recordings of bone-conducted (BC) and air-conducted (AC) speech. The recordings were conducted in a professionally treated anechoic chamber with 100 gender-balanced volunteers. AC speech was captured via a digital microphone on the left channel, while BC speech was recorded from an in-ear BC microphone on the right channel, both at a 44.1 kHz sampling rate to ensure high-fidelity audio.

Categories:

A PMUT-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy

Speech recognition in noisy environments has long posed a challenge. Typically used air conduction microphone (ACM) is susceptible to environmental noise.

Categories:

QiandaoEar22

QiandaoEar22 is a high-quality noise dataset designed for identifying specific ships among multiple underwater acoustic targets using ship-radiated noise. This dataset includes 9 hours and 28 minutes of real-world ship-radiated noise data and 21 hours and 58 minutes of background noise data.

Categories:

Thaat and Raga Forest (TRF) Dataset

This is the official Thaat and Raga Forest (TRF) Dataset

Please do cite our paper: Link to Paper

Dataset is also available here: Link to Dataset

Link to our repository: Github Repo

The "Thaat and Raga Forest (TRF) Dataset" represents

Categories:

AIR-RS-DB: All India Radio Read and Spontaneous Speech Data Base

AIR-RS-DB: A dataset for classifying Spontaneous and Read Speech

A set of 1028 audio files generated from 7 mp3 files downloaded from All India Radio. https://newsonair.gov.in/ and converted into wav and then speaker diarized is using https://huggingface.co/pyannote/speaker-diarization (pyannote/speaker-diarization@2022072,model) and derive 1028 audio files.

These are available as air-rs-db.zip (which can be downloaded)

Categories:

Signal Processing

Pipaset preview: A multimodal dataset dedicated to Chinese music instrument Pipa

The dataset consists of three parts, the first part consists of single notes and playing technique samples, and the second includes the triple viewed video, steoro-microphone recordings and 4 track optical vibration recordings in raw file for famous Chinese Folk music ‘Jasmine Flower’ and the first section of ‘Ambush from ten sides’. The third part concerns about the source separated tracks from optical recordings and expressive annotation files are included in the annotation files.

Categories:

Digital signal processing

Dataset: Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments

Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems

"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"

If you use this code or data, please cite the above paper.

Categories:

Neural Audio Fingerprint Dataset

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas.

Categories:

simulated ofdm wav files

This dataset is generated by GNU Radio.

Categories:

Communications

Audio Steganalysis Dataset

The steganography and steganalysis of audio, especially compressed audio, have drawn increasing attention in recent years, and various algorithms are proposed. However, there is no standard public dataset for us to verify the efficiency of each proposed algorithm. Therefore, to promote the study field, we construct a dataset including 33038 stereo WAV audio clips with a sampling rate of 44.1 kHz and duration of 10s. And, all audio files are from the Internet through data crawling, which is for a better simulation of a real detection environment.

Categories:

BC-AC Multimodal Emotional Speech Dataset (Partial Release)

A PMUT-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy

QiandaoEar22

Thaat and Raga Forest (TRF) Dataset

This is the official Thaat and Raga Forest (TRF) Dataset

Please do cite our paper: Link to Paper

Dataset is also available here: Link to Dataset

Link to our repository: Github Repo

The "Thaat and Raga Forest (TRF) Dataset" represents

AIR-RS-DB: All India Radio Read and Spontaneous Speech Data Base

Pipaset preview: A multimodal dataset dedicated to Chinese music instrument Pipa

Dataset: Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments

Category

Neural Audio Fingerprint Dataset

Category

simulated ofdm wav files

Audio Steganalysis Dataset