Datasets
Standard Dataset
TherapyTalk

- Citation Author(s):
- Submitted by:
- Yeonji Lee
- Last updated:
- Sun, 04/06/2025 - 10:23
- DOI:
- 10.21227/r72y-vs69
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset was built as part of our study MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control. The dataset was sourced from mental health-related posts in Reddit Mental Health Dataset and tagged with responses from mental health professionals to selected posts. For more details on building the dataset, please see the paper.
This dataset consists of 104 Reddit-based conversational examples and is designed to serve as a test set for tasks involving the generation of emotionally supportive responses. Each instance includes three sequential posts (post1, post2, and post3) that provide the context of a conversation, along with a corresponding response written by a human annotator. The raw field contains the original Reddit thread data, including metadata such as the author, timestamp, content of each post, and the subreddit it was posted in. This structure allows models to understand the progression of a conversation and generate context-aware, empathetic responses. The dataset is in English, licensed under CC BY 4.0, and is suitable for evaluation in emotional support dialogue systems or other applications requiring sensitive and contextually appropriate replies.