TherapyTalk

Citation Author(s):
Yeonji
Lee
Sungkyunkwan University
Sangjun
Park
Sungkyunkwan University
Suhyun
Han
Sungkyunkwan University
Kyunghyun
Cho
New York University
JinYeong
Bak
Sungkyunkwan University
Submitted by:
Yeonji Lee
Last updated:
Sun, 04/06/2025 - 10:23
DOI:
10.21227/r72y-vs69
Data Format:
License:
9 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset was built as part of our study MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control. The dataset was sourced from mental health-related posts in Reddit Mental Health Dataset and tagged with responses from mental health professionals to selected posts. For more details on building the dataset, please see the paper.

Instructions: 

This dataset consists of 104 Reddit-based conversational examples and is designed to serve as a test set for tasks involving the generation of emotionally supportive responses. Each instance includes three sequential posts (post1, post2, and post3) that provide the context of a conversation, along with a corresponding response written by a human annotator. The raw field contains the original Reddit thread data, including metadata such as the author, timestamp, content of each post, and the subreddit it was posted in. This structure allows models to understand the progression of a conversation and generate context-aware, empathetic responses. The dataset is in English, licensed under CC BY 4.0, and is suitable for evaluation in emotional support dialogue systems or other applications requiring sensitive and contextually appropriate replies.

Funding Agency: 
Ministry of Science and ICT, South Korea
Grant Number: 
NRF-2021M3A9E4080780, IITP-2025-RS-2020-II201821, RS-2024-00509258