Datasets
Standard Dataset
Korean stock trading app review dataset
- Citation Author(s):
- Submitted by:
- Jinju Park
- Last updated:
- Tue, 07/19/2022 - 04:45
- DOI:
- 10.21227/qqwd-z326
- Data Format:
- License:
181 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
This dataset contains information about Android app users’ reviews crawled from https://play.google.com/store/apps from 2022/4/2 to 2022/4/14. User reviews of 24 Korean trading apps were collected from Google Play Store, and the total number of the collected reviews is 41,705. App name, user ID, review content, rating, and date information were collected for each review by web crawling. The entire dataset is in Korean.
Instructions:
- Data and file description
- This dataset contains 119 files, each of which is the crawled review data for the stock trading app. Each file name starts with the name of the company that owns the app, followed by an abbreviation of the app name if there is more than one app that the same company owns, and ends with the rating, which is a scale of 1 to 5. The dataset contains 119 files, not 120, since one of the apps did not have any reviews with rating 2.
- For example, a file named “daesin_1” is the review data (with rating 1) collected from the main app owned by the company named “Daesin Securities” and a file named “daesin_crayon_1” is the review data (with rating 1) from another app named “Crayon” owned by the same company.
- Column description
- Each data file consists of app name, user ID, review content, rating, and date.
- A column named “app name(앱이름)” is the full name of the app.
- A column named “user ID(아이디)” is the ID of users who left the review. For some cases where user IDs were not properly crawled, it appears to be replaced by its company name.
- A column named “review(리뷰)” is the actual review data left by app users.
- A column named “rating(별점)” shows how many stars each user gave to the app. The column element itself means “It received 1 out of 5 stars”, in the case of rating 1. If you would like to convert the data type of this column to integer, you can extract the second number which indicates the actual rating.
- A column named “date(날짜)” is the date when the review was written. The column element is formatted as “yyyy년 mm월 dd일” where a Korean word “년” can be translated to “year”, “월” to “month”, and “일” to “day”. You can convert its data type to whichever format you would prefer, such as YYYY-MM-DD.
- How to use this data
- This dataset can be used for different types of data analysis project. I used this dataset for text classification by assigning labels. You can also use this dataset for word tokenization (in Korean), sentiment anlaysis, topic modeling, etc.