This dataset has been developed based on the work of the GeoCOV19Tweets Dataset. The original work by Lamsal, R. runs network analysis on a similar dataset to understand the underlying relationship between countries and hashtags. The work did an analysis on roughly 300k number of [country, hashtag] relations from 190 countries and territories, and 5055 unique hashtags. This work pushes the number of relationships by 3 times.
This dataset provides [place, hashtag] relationships in a Comma-separated values (CSV) file. Each line represents a relationship. You can simply use the CSV file as per your research needs.
However, if you need to change the place entity from city (currently the dataset uses ["place"]["name"] object) to country, you'll have to consider the ["place"]["country"] object instead. The sample script is provided with this dataset. The script takes in a list of tweet IDs present in a CSV file and hydrates the IDs to extract places and hashtags relationships. The script is written for twarc.