Sequential Storytelling Image Dataset (SSID)

Name: Sequential Storytelling Image Dataset (SSID)
Creator: Zainy Malakan
License: https://creativecommons.org/licenses/by/4.0/

Citation Author(s):: Zainy M. Malakan (Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia)

Saeed Anwar ( Information and Computer Science, King Fahad University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia)

Ghulam Mubashar Hassan (Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia)

Ajmal Mian (Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia)
Submitted by:: Zainy Malakan
Last updated:: Sat, 08/26/2023 - 10:38
DOI:: 10.21227/dbr9-dq51
Data Format:: *.csv (zip); *.json (zip); *.pickle (zip); *.npz (zip);
Research Article Link:: Sequential Vision to Language as Story: A Storytelling Dataset and Benchmarking

2028 views

Categories:

Keywords:

Storytelling

Visual Understanding Dataset

Image and Video Captioning

computer vision

and Sequential Storytelling Image Dataset (SSID).

ACCESS DATASET CITE

Abstract

Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations. In addition, we provide four annotations (i.e., stories) for each set of five images. The image sets are collected manually from publicly available videos in three domains: documentaries, lifestyle, and movies, and then annotated manually using Amazon Mechanical Turk. In summary, SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story.

Instructions:

The SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story. Please go through the attached PDF file for additional Instructions details.

Funding Agency

This research was supported by the Australian Research Council.

Grant Number

FT210100268

Hi, I need access to this dataset for research purposes. Chiranjib

Chiranjib Bhat… Wed, 07/26/2023 - 08:37 Permalink

Subject: Request for DiDeMoSV Story Continuation Datasets Dear author of StoryDALL-E,

I am a researcher in the field of story generation. I am writing to request access to the DiDeMoSV story continuation datasets. These datasets would be of great value to my current research project as I strive to develop new story generation algorithms. I assure you that the datasets will be used solely for research purposes. Thank you for considering my request. I look forward to your positive response. Best regards, Ting Pan

Ting Pan Sat, 12/14/2024 - 07:14 Permalink