This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: LSB steganography (1) and without LSB Steganography (0). These features are Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity, all extracted from the histograms of the still images, including random spatial transformations. The steganographic function embeds five types of payloads, from 0.1 to 0.5.


The steganography and steganalysis of audio, especially compressed audio, have drawn increasing attention in recent years, and various algorithms are proposed. However, there is no standard public dataset for us to verify the efficiency of each proposed algorithm. Therefore, to promote the study field, we construct a dataset including 33038 stereo WAV audio clips with a sampling rate of 44.1 kHz and duration of 10s. And, all audio files are from the Internet through data crawling, which is for a better simulation of a real detection environment.


Mp3 is a very popular audio format and hence it can be a good host for carrying hidden messages. Therefore, different steganography methods have been proposed for mp3 hosts. UnderMp3Cover is one of such algorithms and has some important advantage over other comparable methods. First, the popular steganography method mp3stego, works directly on non-compressed samples. Therefore, using covers that have been compressed before could lead to serious degradation of its security. UnderMp3Cover does not have this important limitation.