Abstract

This dataset contains precomputed MS-COCO and Flickr30K Faster R-CNN image features, which are all the data needed for reproducing the experiments in "Stacked Cross Attention for Image-Text Matching", our ECCV 2018 paper. We use splits produced by Andrej Karpathy. The raw images can be downloaded from their original sources http://nlp.cs.illinois.edu/HockenmaierGroup/Framing_Image_Description/KC..., http://shannon.cs.illinois.edu/DenotationGraph/ and http://mscoco.org/.

The precomputed image features of MS-COCO are originally from https://github.com/peteanderson80/bottom-up-attention. The precomputed image features of Flickr30K are extracted from the raw Flickr30K images using the bottom-up attention model from https://github.com/peteanderson80/bottom-up-attention.

Instructions:

The image features are stored in the ./data directory, and vocabulary mapping files are stored in the ./vocab directory.

Prefix 'train', 'dev', and 'test' represent the training, validation, and test sets, respectively. For the CoCo dataset, the prefix 'testall' represents the complete test set, and the prefix 'test' represents part of the test set.

Comments

I need the dataset for project

Submitted by Harrison Anaele on Fri, 08/30/2024 - 20:51

Dataset Files

Files have not been uploaded for this dataset

Datasets

Standard Dataset

SCAN Faster R-CNN Image Features

Abstract

Comments

Dataset Files

QUESTIONS?