CAO System Emoticon Parts Dataset with Emotion Labels

Citation Author(s):
Kitami Institute of Technology
Submitted by:
Michal Ptaszynski
Last updated:
Thu, 03/14/2019 - 00:03
Data Format:
0 ratings - Please login to submit your rating.


The presented dataset has been used as a basis for CAO - a system for analysis of emoticons in Japanese online communication, developed by Ptaszynski et al. (2010). Emoticons are strings of symbols widely used in text-based online communication to convey user emotions. The database contains: 1) a predetermined raw emoticon database containing over ten thousand emoticon samples extracted from the Web, 2) emoticon parts automatically divided from raw emoticons into semantic areas representing “mouths” or “eyes”. Both raw emoticons, as well as the emoticon areas, are automatically annotated with emotions according to their co-occurrence in the database.


We present a database of emoticons – face marks widely used to convey emotions in text-based online communication.  The database is created by gathering emoticons from numerous dictionaries of face marks and online jargon. The inconsistencies in emotion classification provided by various dictionaries are solved by processing them with an affect analysis system developed previously. Having the emoticon database annotated automatically this way, we extract from it patterns of semantic areas of emoticons, such as "eyes" and "mouths".  Finally, we perform annotation of the semantic areas based on co-occurrence statistics and the theory of kinesics.