AI-based Automated Extraction of Entities, Entity Categories and Sentiments on COVID-19 Situation

Citation Author(s):
Federal Government
Submitted by:
Fahim Sufi
Last updated:
Fri, 11/26/2021 - 07:24
Data Format:
0 ratings - Please login to submit your rating.




With the proliferation of social media users, an enormous number of social media contents are being generated every single day on COVID-19 situation. Artificial Intelligence (AI) based in-depth analysis of social media content would allow a strategic decision maker to obtain evidence-based responses on complex queries like “Which city in Australia is associated with the most negative tweets on Corona situation (i.e., City and Country-Region Entity)” or “Which person is most hated in US for COVID related issues (i.e., Person Name Entity)” or “Which COVID-19 vaccine is gaining popularity in the last 3 weeks (i.e., Positive sentiment on Product Entity)” etc. To this end, we used a new fully automated algorithm based on artificial intelligence (AI) with Sentiment analysis, entity recognition and translation. The data obtained through this unique methodology allows sentiment analysis on COVID-19 through 24 different perspectives. The reported data produces exhaustive knowledge and insights on social media feeds related to COVID-19 in 110 languages. We deployed and tested this algorithm on live Twitter feeds from July 15, 2021 to August 10, 2021. During these 27 days period, the deployed solution successfully analyzed 1866 message and detected 5016 entities. These 5016 detected entities were automatically classified into 24 different entity types. Out of 1866 tweet messages analyzed, 990 Tweets contained one or more location entities. In total 1,322 location entities were detected falling under city, continent, country region, language and state entity categories.





The dataset contains following 11 fields:


  1. TweetID (ID of the Tweet text): Text String
  2. Entity Type (category of the entity): Text String
  3. Entity Value (detected entity): Any
  4. Entity Score (confidence of the detected entity): Decimal (0 to 1)
  5. Sentiment (result of sentiment analysis): Text (positive, negative, neutral, mixed)
  6. Positive Confidence (confidence strength that the sentiment is positive): Decimal (0 to 1)
  7. Negative Confidence (confidence strength that the sentiment is negative): Decimal (0 to 1)
  8. Neutral Confidence (confidence strength that the sentiment is neutral): Decimal (0 to 1)
  9. Specified Location (user specified location of the Twitter user): Text String
  10. Time (time of tweet): Date/Time
  11. Retweets (number of retweets): Numeric