Abstract 

The modified CASIA dataset is created for research topics like: perceptual image hash, image tampering detection, user-device physical unclonable function and so on. 

The ground truth images are taken from the CASIA image tampering detection evaluation (ITDE) v1.0 database (J.Dong, et. al, CASIA image tampering detection evaluation database, ChinaSIP2013), which contains images from eight categories (animal, architecture, article, character, nature, plant, scene and texture) of size 384X256 or 256X384. Instead of directly using the tampered image set from CASIA ITDE v1.0, the tampered versions of those authentic images are selected from CASIA ITDE V2.0, which are more challenging and comprehensive since it considers post-processing like blurring or filtering over the tampered regions to make the tampered images appear realistic to human eyes. For one authentic image, there may be several tampered versions in the CASIA ITDE v2.0 dataset. To increase the diversity, only one tampered version is kept for each authentic image. As a result, the modified CASIA database contains 400 (8 categories X 50 per category) authentic images and their corresponding tampered versions.

According to CASIA ITDE v2.0, the tampered images are generated using crop-and-paste operation under Adobe Photoshop on the authentic images, and the tampered regions may have random shapes and different sizes, rotations or distortions. In order to evaluate the proposed system performance over content-preserving manipulations, we enrich the modified CASIA dataset by adding content-preserving manipulations to the authentic images using Matlab and ImageJ. Common image processing techniques like rotation, scaling, filtering and JPEG compression, and unavoidable process/transmission noises like Gaussian, Salt&Pepper and speckle noise are considered. Furthermore, the abovementioned content-preserving manipulations are also applied to the tampered dataset to evaluate if their combination can evade detection. As a result, the modified CASIA database D contains:

1) D_au: 400 authentic images in 8 categories, each with 50 images;

2) D_tampered: 400 tampered images corresponding to the authentic ones from D_au;

3) D_au_cp: 3600 (400x 9) images generated by adding a single content-preserving manipulation (9 types: Gaussian noise, salt&pepper noise, speckle noise, Gaussian filter, motion blur, JPEG compression, gamma correction, rotation and scaling) to every image of D_au;

4) D_tampered_cp: 3600 (400x9) tampered images by applying those 9 content-preserving manipulations listed in 3) to the images of Dtampered.

To use the dataset, please cite: 

[1] Y. Zheng, Y. Cao and C. Chang, "A PUF-Based Data-Device Hash for Tampered Image Detection and Source Camera Identification," in IEEE Transactions on Information Forensics and Security, vol. 15, pp. 620-634, 2020.

[2]J. Dong, W. Wang and T. Tan, "CASIA Image Tampering Detection Evaluation Database," 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, 2013, pp. 422-426.

 

 

Comments

I wants dataset for my project

Submitted by NEETHU A B on Sat, 02/26/2022 - 01:13

I am happy now

Submitted by NEETHU A B on Sat, 02/26/2022 - 01:15

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.