Datasets
Standard Dataset
Dataset
- Citation Author(s):
- Submitted by:
- Jing Jie Tan
- Last updated:
- Wed, 01/22/2025 - 04:43
- DOI:
- 10.21227/dyfp-7f45
- Links:
- License:
- Categories:
- Keywords:
Abstract
The Essays-Big5 and Kaggle-MBTI datasets are valuable resources for personality research, combining diverse textual data with psychological labels. The Essays-Big5 dataset includes over 2,000 personal essays annotated with Big Five personality traits, enabling the exploration of linguistic patterns correlated with personality dimensions, with data split stratified by personality trait distributions to ensure balanced representation. The Kaggle-MBTI dataset offers 8,000 social media posts labeled with Myers-Briggs Type Indicator (MBTI) profiles, also employing stratified splits to preserve type proportions. Together, these datasets facilitate advancements in natural language processing by providing balanced, annotated data for robust personality modeling in varied contexts.
from datasets import load_dataset
ds = load_dataset("jingjietan/essays-big5")