The modified CASIA dataset is created for research topics like: perceptual image hash, image tampering detection, user-device physical unclonable function and so on.
The ground truth images are taken from the CASIA image tampering detection evaluation (ITDE) v1.0 database (J.Dong, et. al, CASIA image tampering detection evaluation database, ChinaSIP2013), which contains images from eight categories (animal, architecture, article, character, nature, plant, scene and texture) of size 384X256 or 256X384. Instead of directly using the tampered image set from CASIA ITDE v1.0, the tampered versions of those authentic images are selected from CASIA ITDE V2.0, which are more challenging and comprehensive since it considers post-processing like blurring or filtering over the tampered regions to make the tampered images appear realistic to human eyes. For one authentic image, there may be several tampered versions in the CASIA ITDE v2.0 dataset. To increase the diversity, only one tampered version is kept for each authentic image. As a result, the modified CASIA database contains 400 (8 categories X 50 per category) authentic images and their corresponding tampered versions.
According to CASIA ITDE v2.0, the tampered images are generated using crop-and-paste operation under Adobe Photoshop on the authentic images, and the tampered regions may have random shapes and different sizes, rotations or distortions. In order to evaluate the proposed system performance over content-preserving manipulations, we enrich the modified CASIA dataset by adding content-preserving manipulations to the authentic images using Matlab and ImageJ. Common image processing techniques like rotation, scaling, filtering and JPEG compression, and unavoidable process/transmission noises like Gaussian, Salt&Pepper and speckle noise are considered. Furthermore, the abovementioned content-preserving manipulations are also applied to the tampered dataset to evaluate if their combination can evade detection. As a result, the modified CASIA database D contains:
1) D_au: 400 authentic images in 8 categories, each with 50 images;
2) D_tampered: 400 tampered images corresponding to the authentic ones from D_au;
3) D_au_cp: 3600 (400x 9) images generated by adding a single content-preserving manipulation (9 types: Gaussian noise, salt&pepper noise, speckle noise, Gaussian filter, motion blur, JPEG compression, gamma correction, rotation and scaling) to every image of D_au;
4) D_tampered_cp: 3600 (400x9) tampered images by applying those 9 content-preserving manipulations listed in 3) to the images of Dtampered.
To use the dataset, please cite:
 Y. Zheng, Y. Cao and C. Chang, "A PUF-Based Data-Device Hash for Tampered Image Detection and Source Camera Identification," in IEEE Transactions on Information Forensics and Security, vol. 15, pp. 620-634, 2020.
J. Dong, W. Wang and T. Tan, "CASIA Image Tampering Detection Evaluation Database," 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, 2013, pp. 422-426.
"training" folder: 160 images are categorized into the "training" folder, in which, "au" folder contains the original (authentic) 160 images without any modifications; "au_cp_**" folder contains those 160 images that undergoes content-preserving (cp) operations. For example, "au_cp_gamma" means those images inside are obtained by appling gamma corrections to the authentic images in the "au" folder; The "tampered" folder is the tampered version of the "au" folder correspondingly.
"Au_ani_0001" in "au" folder: authentic image, animal category, index 0001;
"ani00008_ani00011_105" in "tampered" folder: tamper image "Au_ani_0008" by applying partial contents of image "Au_ani_0011". Here "Au_ani_0008" and "Au_ani_0011" are all from "au" folder.
"testing" folder: Another 240 images are categorized into the "testing" folder. The naming rules of the sub-folders and the images are same as the "training" folder. Inside, "tampered_with_cp" is the D_tampered_cp as introduced in the above Abstract. "testing/tampered_with_cp/gamma" folder indicate s the authentic images are applied both tampering operation and gamma correction.