Skip to main content

Datasets

Standard Dataset

Audio Steganalysis Dataset

Citation Author(s):
Yuntao Wang
Kun Yang
Yunzhao Yang
Jinghong Zhang
Xianfeng Zhao (State Key Laboratory of Information Security, Institute of Information Engineering Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences)
Submitted by:
yuntao wang
Last updated:
DOI:
10.21227/rab0-vf56
Data Format:
Research Article Link:
Links:
3706 views
Categories:
Keywords:
Average: 1 (1 vote)

Abstract

The steganography and steganalysis of audio, especially compressed audio, have drawn increasing attention in recent years, and various algorithms are proposed. However, there is no standard public dataset for us to verify the efficiency of each proposed algorithm. Therefore, to promote the study field, we construct a dataset including 33038 stereo WAV audio clips with a sampling rate of 44.1 kHz and duration of 10s. And, all audio files are from the Internet through data crawling, which is for a better simulation of a real detection environment. The dataset is used for MP3 steganalysis at this stage. We provide corresponding MP3 encoder, LAME, and steganographic encoder, HCM, EECS and so on, which is developed based on LAME. What's more, some useful python scripts are supplied for samples make in batch. The dataset is still expanding, and we will include AAC, AMR and other audio formats in the future.

Keywords: Audio, MP3, Steganalysis, Steganography

Instructions:

1. Download dataset with appropriate size according to your need.

2. Samples make through the script "samples_make.py".

3. QMDCT coefficients extraction through the script "QMDCT_extraction.py".

4. Design your own handcrafted features or networks.

More information is shown in "instruction.md".

This dataset is well-orgnized and extemely valuable in audio steganlysis. I believe that this dataset will inspire many related researches.

Fengchun Qiao Mon, 05/06/2019 - 13:35 Permalink

A nice dataset. All people who are engaging audio steganography and steganalysis can try it out.

jing jing Tue, 05/07/2019 - 06:55 Permalink

A good data set for the reseachers delve into audio steganography and steganalysis !

Haorui Wu Wed, 05/08/2019 - 08:51 Permalink

The dataset takes an important role in researching audio steganlysis.Especially useful.

yu zhang Sat, 10/19/2019 - 08:26 Permalink

Thanks to the author for providing a good data set for audio steganography and steganographic analysis, very useful.

Liangliang Wu Sun, 10/20/2019 - 12:57 Permalink

What a nice dataset.It's very helpful for me. 

edhs Liu Mon, 10/21/2019 - 13:00 Permalink

sorry ,how to download the dataset files on the right of this page?thank you

dwyane wade Sat, 11/23/2019 - 06:32 Permalink

Please inform how to download the dataset files given above on this page?

BR,

 

Shahzad Ahmad Qureshi, Dr

 

Shahzad Ahmad … Mon, 02/17/2020 - 14:03 Permalink

Thanks for sharing.I am not surprised that the author can write such a good article, because he is so  specialized.

shuai Huang Wed, 05/27/2020 - 03:41 Permalink
你好,请问,这儿只提供了33038个样本中的10000个嘛
zhongyuan wei Thu, 11/19/2020 - 09:21 Permalink