Qi Yang
Tue, 07/23/2019 - 00:13
We collect almost 248,166 public microblogs according to selected 97 hashtags of "Top 100" on Instagram. The final collection contains 56861 microblogs which include both text and image, called MultiModal data from Instagram (MM-INS). We filter duplicate hashtags in one sample and drop out those microblogs without texts.


This dataset is a collection of crawled microblogs from Instagram by using Instaloader API, As the raw dataset is too larger to upload all of them, we choose three representatives sub-dataset, including "#beach", "#cat", "#dog". Hope these samples can be helpful for your research, and we are open for academic cooperation if necessary.


[1] Qi Yang, "MultiModal dataset from Instragram", IEEE Dataport, 2019. [Online]. Available: Accessed: Aug. 23, 2019.
