CAO System Emoticon Parts Dataset with Emotion Labels

Citation Author(s):: Michal Ptaszynski (Kitami Institute of Technology)
Submitted by:: Michal Ptaszynski
Last updated:: Thu, 03/14/2019 - 04:03
DOI:: 10.21227/47f4-kc44
Data Format:: .zip
Links:: CAO: Emoticon Analysis System / Emoticon Database

273 views

Categories:

Keywords:

emoticon

natural langauge processing

ACCESS DATASET CITE

Abstract

The presented dataset has been used as a basis for CAO - a system for analysis of emoticons in Japanese online communication, developed by Ptaszynski et al. (2010). Emoticons are strings of symbols widely used in text-based online communication to convey user emotions. The database contains: 1) a predetermined raw emoticon database containing over ten thousand emoticon samples extracted from the Web, 2) emoticon parts automatically divided from raw emoticons into semantic areas representing “mouths” or “eyes”. Both raw emoticons, as well as the emoticon areas, are automatically annotated with emotions according to their co-occurrence in the database.

Instructions:

We present a database of emoticons – face marks widely used to convey emotions in text-based online communication. The database is created by gathering emoticons from numerous dictionaries of face marks and online jargon. The inconsistencies in emotion classification provided by various dictionaries are solved by processing them with an affect analysis system developed previously. Having the emoticon database annotated automatically this way, we extract from it patterns of semantic areas of emoticons, such as "eyes" and "mouths". Finally, we perform annotation of the semantic areas based on co-occurrence statistics and the theory of kinesics.