Cross-modal Retrieval; Image-text Matching; Multi-modal; Deep Learning
This dataset contains precomputed MS-COCO and Flickr30K Faster R-CNN image features, which are all the data needed for reproducing the experiments in "Stacked Cross Attention for Image-Text Matching", our ECCV 2018 paper. We use splits produced by Andrej Karpathy.
- Categories:
58 Views