Datasets
Standard Dataset
CAO System Emoticon Parts Dataset with Emotion Labels
- Citation Author(s):
- Submitted by:
- Michal Ptaszynski
- Last updated:
- Thu, 03/14/2019 - 00:03
- DOI:
- 10.21227/47f4-kc44
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
The presented dataset has been used as a basis for CAO - a system for analysis of emoticons in Japanese online communication, developed by Ptaszynski et al. (2010). Emoticons are strings of symbols widely used in text-based online communication to convey user emotions. The database contains: 1) a predetermined raw emoticon database containing over ten thousand emoticon samples extracted from the Web, 2) emoticon parts automatically divided from raw emoticons into semantic areas representing “mouths” or “eyes”. Both raw emoticons, as well as the emoticon areas, are automatically annotated with emotions according to their co-occurrence in the database.
We present a database of emoticons – face marks widely used to convey emotions in text-based online communication. The database is created by gathering emoticons from numerous dictionaries of face marks and online jargon. The inconsistencies in emotion classification provided by various dictionaries are solved by processing them with an affect analysis system developed previously. Having the emoticon database annotated automatically this way, we extract from it patterns of semantic areas of emoticons, such as "eyes" and "mouths". Finally, we perform annotation of the semantic areas based on co-occurrence statistics and the theory of kinesics.
Dataset Files
- Unprocessed emoticons in the form I downloaded them from the Internet, with their frequencies within the database, divided by em emoticon-raw-stats.zip (168.11 kB)
- Emoticon mouths with their frequencies within the database, divided by emotion types. Zipped archive. emoticon-mouths-stats.zip (8.68 kB)
- Emoticon eyes with their frequencies within the database, divided by emotion types. Zipped archive. emoticon-eyes-stats.zip (11.11 kB)
- Emoticon triplets (eye-mouth-eye) divided by emotion types. Zipped archive. emoticon-triplets.zip (43.04 kB)
- Emoticon triplets (eye-mouth-eye) with their frequencies within the database, divided by emotion types. Zipped archive. emoticon-triplets-stats.zip (31.89 kB)
- All unique characters appearing in all emoticons used in the system, with their frequencies within database. emoticon-unique-chars-with-frequencies.txt (4.47 kB)
- All unique emoticon triplets used in the system, sorted by length (number of characters). emoticons-sorted-by-length.txt (29.84 kB)
- A simple Perl script that decects the presence of emoticons (or emoticon bundle). Should work for most cases. It uses a simplifi emodetector.zip (4.47 kB)
Documentation
Attachment | Size |
---|---|
Description of the database in a symposium paper | 2.04 MB |
GCOE-NGIT_poster.pdf | 3.86 MB |