Datasets
Standard Dataset
Detecting and Localizing Text-Image Synchronization Forgery
- Citation Author(s):
- Submitted by:
- Zhigeng Han
- Last updated:
- Sun, 02/16/2025 - 02:23
- DOI:
- 10.21227/0pap-1m14
- License:
- Categories:
- Keywords:
Abstract
DLSF is the first dedicated dataset for Text-Image Synchronization Forgery (TISF) in multimodal media. The source data for this dataset is scraped from the Chinese news aggregation platform, Toutiao. This dataset includes extensive text, image, and audio-video data from news articles involving politicians and celebrities, featuring samples of both entity-level and attribute-level TISF. It provides comprehensive annotations, including labels for text-image authenticity, types of TISF, image forgery regions, and text forgery tokens. The current DLSF dataset consists of 2,200 image-text-audio-video sample pairs, including 179 pairs of attribute-level TISF samples (FA+TA) and 279 pairs of entity-level TISF samples (FS+TS). It is designed to evaluate model performance in detecting and localizing TISF effectively.
The DLSF dataset includes the files train_v1.3.json and test_v1.3.json, with the data organized as follows:
{
"title": "房产过户遵从遗嘱保障权益",
"video_path": "./Data/videos/o8DEmpgEh7zAIDkfmdBxdzxJEujAeBQvIxUPtg.mp4",
"image_path": "./Data/images/7369231429441765899.jpg",
"fake_text_pos": [
4,
5,
6,
7,
8,
9,
10,
11
],
"bbox": [
157,
128,
355,
392
],
"fake_cls": "face_attribute&text_attribute",
"con_label": 0
},
title represents the news headline text.
video_path represents the storage path for the video.
image_path represents the storage path for the news images.
fake_text_pos marks the positions of the words that were altered in the text.
bbox indicates the areas in the image that were tampered with.
fake_cls represents the type of text-image synchronization forgery (face_attribute: image attribute editing, face_swap: face swapping, text_attribute: text attribute editing, text_swap: entity name replacement).
con_label indicates whether the text-image pair is synchronously forged (0 for forged, 1 for not forged).
In addition, the DLSF dataset includes the following folders:
The videos folder contains the original news videos.
The images folder contains both the original and tampered news images.
The audio folder contains encoded audio data, stored in .npy format.
Comments
1