Twitter Data 2021 (1 year)

Citation Author(s):
Prince of Songkla University
Prince of Songkla University
Submitted by:
Anny Mardjo
Last updated:
Sat, 06/25/2022 - 22:31
0 ratings - Please login to submit your rating.


In recent years, Bitcoin and other cryptocurrencies have been increasingly considered an investment option for an emerging market. However, its erratic behavior has discouraged some potential investors. To get insights into its behavior and price fluctuation, past studies have discovered the correlation between Twitter sentiments and Bitcoin behavior. Most of them have focused exclusively on their relationships, instead of the Twitter sentiment analysis itself. Finding the most suitable classification algorithms for sentiment analysis for this kind of data is challenging. For enormous data of Twitter, unlabeled data can be time-consuming and expensive for the supervised sentiment analysis approach, which has been studied to be superior to unsupervised ones. As such, we propose HyVADRF: Hybrid VADER – Random Forest and Grey Wolf Optimizer Model. Semantic and rule-based VADER was used to calculate polarity scores and classify sentiments, which overcame the weakness of manual labeling, while Random Forest was utilized as its supervised classifier. Furthermore, considering Twitter’s massive size, over 3.6 million tweets, our study analyzed various dataset sizes as these are related to the model’s learning process. Lastly, Grey Wolf Optimizer parameter tuning was conducted to optimize the classifier’s performance. The results show that 1) HyVADRF Model returned the accuracy of 75.29 %, precision of 70.22%, recall of 87.70%, and F1-score 78%.  2) The most ideal percentage of dataset size is 90% of the total collected tweets (n=1,249,060). 3)  With standard deviations of 0.0008 for accuracyand F1-score and 0.0011 for precision and recall. Hence, HyVADERF Model consistently delivers stable results.


This file can be open using Excel or R program