TherapyTalk

Citation Author(s):: Yeonji Lee (Sungkyunkwan University)

Sangjun Park (Sungkyunkwan University)

Suhyun Han (Sungkyunkwan University)

Kyunghyun Cho (New York University)

JinYeong Bak (Sungkyunkwan University)
Submitted by:: Yeonji Lee
Last updated:: Sun, 04/06/2025 - 14:23
DOI:: 10.21227/r72y-vs69
Data Format:: *.csv

15 views

Categories:

Artificial Intelligence

Keywords:

Mental Health Dataset

ACCESS DATASET CITE

Abstract

This dataset was built as part of our study MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control. The dataset was sourced from mental health-related posts in Reddit Mental Health Dataset and tagged with responses from mental health professionals to selected posts. For more details on building the dataset, please see the paper.

Instructions:

This dataset consists of 104 Reddit-based conversational examples and is designed to serve as a test set for tasks involving the generation of emotionally supportive responses. Each instance includes three sequential posts (post1, post2, and post3) that provide the context of a conversation, along with a corresponding response written by a human annotator. The raw field contains the original Reddit thread data, including metadata such as the author, timestamp, content of each post, and the subreddit it was posted in. This structure allows models to understand the progression of a conversation and generate context-aware, empathetic responses. The dataset is in English, licensed under CC BY 4.0, and is suitable for evaluation in emotional support dialogue systems or other applications requiring sensitive and contextually appropriate replies.

Funding Agency

Ministry of Science and ICT, South Korea

Grant Number

NRF-2021M3A9E4080780, IITP-2025-RS-2020-II201821, RS-2024-00509258