Dataset of Disinformation on X in Japan

Citation Author(s):: Shuhei Ippa (Institute of Information Security)

Takao Okubo (Institute of Information Security)

Masaki Hashimoto (Kagawa University)
Submitted by:: Masaki Hashimoto
Last updated:: Fri, 12/27/2024 - 06:19
DOI:: 10.21227/5sqm-m222
Data Format:: *.avi; *.csv; *.txt; *.zip

96 views

Categories:

Security

Keywords:

Disinformation

ACCESS DATASET CITE

Abstract

This study analyzes and characterizes the relationship between human emotions and other elements (social bots and echo chambers), which are major factors in the spread of information, including disinformation and misinformation. The data set consists of a CSV file of posts that match the target word and period, and a database of the accounts that made the posts.

The following is a overall description of the objects and what the audience can expect to gain by downloading them.

・Post.csv: Post data on the word "PASCO crickets" from February 26, 2023 to March 23, 2023

・Account.db: Accounts that have posted (including reposts) or have been reposted in the case of the PASCO case

The total size of all objects is as follows.

・Post.csv (about 63.4MB)

・Account.db (about 34.6MB)

Instructions:

1) IPPA_Tweet_Collection

- Collects the target posts (including reposts) and saves them as a CSV file. An X API account is required. Enter the following command to start the script.

python IPPA_Tweet_Collection.py KeyWord 2024-01-01-2024-12-31

2) IPPA_Account_Registration

- To calculate the bot score using Botometer, register your account in the DB first. Enter the following command to start the script.

python Account_Registration.py KeyWord

3) IPPA_Botometer

- Calculate the bot score using Botometer and update the DB. Start the script by entering the following command. To use Botometer, you need a Rapid API account. To start this script, you need to save the Rapid API account you created as an ini file. Enter the following command to start the script.

python IPPA_Botometer.py

4) IPPA_Integration

- The bot score is obtained from the DB and added to the CSV file that contains the collected posts. Enter the following command to start the script.

python IPPA_Integration.py KeyWord

5) IPPA_BERT

- The Emotional Score is added to the CSV file that contains the collected posts. Enter the following command to start the script.

python IPPA_BERT.py

6) IPPA_K_Core

- Extract reposts from the csv file that contains the collected posts and perform k-core decomposition. Enter the following command to start the script.

python IPPA_K_Core.py

7) IPPA_Main

- Start files 8) to 15) all at once. Enter the following command to start the script.

python IPPA_Main.py

8) IPPA_Analysis_Cluster

- Create a csv file of reposts for each community.

9) IPPA_Analysis_DateTime

- Create a CSV file showing the number of reposts for the whole and each community, repost relationships (human, social bot), the formation of echo chambers, and the transition of Emotional Scores.

10) IPPA_Visualization_ClusterPercentage

- Visualize the percentage of reposts for each community as a pie chart.

11) IPPA_Visualization_ClusterRelationship

- Visualize the repost relationships for each community as a heat map.

12) IPPA_Visualization_ClusterSentimental

- Visualize the Emotional Scores for each community as a box plot.

13) IPPA_Visualization_DateTime

- Visualize the number of reposts and repost relationships (human, social bot) for the whole and each community, the formation of echo chambers, and the transition of Emotional Scores. Test the relationship between social bots, echo chambers, and Emotional Scores using time-series cross-correlation.

14) IPPA_Visualization

- Perform common processing for each visualization.

15) IPPA_Common

- Perform common processing required for creating CSV files and visualizations.