Datasets
Standard Dataset
MultiModal dataset from Instragram
- Citation Author(s):
- Submitted by:
- Qi Yang
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/j1rf-fa09
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
We collect almost 248,166 public microblogs according to selected 97 hashtags of "Top 100" on Instagram. The final collection contains 56861 microblogs which include both text and image, called MultiModal data from Instagram (MM-INS). We filter duplicate hashtags in one sample and drop out those microblogs without texts.
This dataset is a collection of crawled microblogs from Instagram by using Instaloader API, https://instaloader.github.io/. As the raw dataset is too larger to upload all of them, we choose 3 sub-datasets without preprocessing, including "#beach", "#cat", "#dog", and the corresponding sub-datasets with preprocessing that remove those images without texts, including "beach", "cat", "dog". Hope these samples can be helpful for your research, and we are open for academic cooperation if necessary.
Dataset Files
- microblogs on Instragram with "beach" hashtag. #beach.7z (255.18 MB)
- microblogs on Instragram with "cat" hashtag. #cat.7z (367.29 MB)
- microblogs on Instragram with "dog" hashtag. #dog.7z (130.01 MB)
- amazing.zip (56.90 MB)
- beach.zip (76.00 MB)
- cat.zip (89.29 MB)
- dog.zip (78.93 MB)