Detecting and Localizing Text-Image Synchronization Forgery

- Citation Author(s):
-
Jian Chen
- Submitted by:
- Zhigeng Han
- Last updated:
- DOI:
- 10.21227/0pap-1m14
- Categories:
- Keywords:
Abstract
DLSF is the first dedicated dataset for Text-Image Synchronization Forgery (TISF) in multimodal media. The source data for this dataset is scraped from the Chinese news aggregation platform, Toutiao. This dataset includes extensive text, image, and audio-video data from news articles involving politicians and celebrities, featuring samples of both entity-level and attribute-level TISF. It provides comprehensive annotations, including labels for text-image authenticity, types of TISF, image forgery regions, and text forgery tokens. The current DLSF dataset consists of 2,200 image-text-audio-video sample pairs, including 179 pairs of attribute-level TISF samples (FA+TA) and 279 pairs of entity-level TISF samples (FS+TS). It is designed to evaluate model performance in detecting and localizing TISF effectively.
Instructions:
The DLSF dataset includes the files train_v1.3.json and test_v1.3.json, with the data organized as follows:
{
"title": "房产过户遵从遗嘱保障权益",
"video_path": "./Data/videos/o8DEmpgEh7zAIDkfmdBxdzxJEujAeBQvIxUPtg.mp4",
"image_path": "./Data/images/7369231429441765899.jpg",
"fake_text_pos": [
4,
5,
6,
7,
8,
9,
10,
11
],
"bbox": [
157,
128,
355,
392
],
"fake_cls": "face_attribute&text_attribute",
"con_label": 0
},
title represents the news headline text.
video_path represents the storage path for the video.
image_path represents the storage path for the news images.
fake_text_pos marks the positions of the words that were altered in the text.
bbox indicates the areas in the image that were tampered with.
fake_cls represents the type of text-image synchronization forgery (face_attribute: image attribute editing, face_swap: face swapping, text_attribute: text attribute editing, text_swap: entity name replacement).
con_label indicates whether the text-image pair is synchronously forged (0 for forged, 1 for not forged).
In addition, the DLSF dataset includes the following folders:
The videos folder contains the original news videos.
The images folder contains both the original and tampered news images.
The audio folder contains encoded audio data, stored in .npy format.
1