Kangning Dataset of Clinical Interview for Depression

Citation Author(s):: Kaining Mao

Deborah Baofeng Wang

Tiansheng Zheng

Rongqi Jiao

Yanhui Zhu

Bin Wu

Lei Qian

Wei Lyu

Jie Chen (University of Alberta)

Minjie Ye
Submitted by:: Kaining Mao
Last updated:: Mon, 04/08/2024 - 23:24
DOI:: 10.21227/b8rw-gb61
Research Article Link:: Analysis of Automated Clinical Depression Diagnosis in a Chinese Corpus

944 views

Categories:

Artificial Intelligence

Keywords:

Emotional corpora

Machine Learning

ACCESS DATASET CITE

Abstract

create

We're excited to present a unique challenge aimed at advancing automated depression diagnosis. Traditional methods using written speech or self-reported measures often fall short in real-world scenarios. To address this, we've curated a dataset of authentic depression clinical interviews from a psychiatric hospital.

The dataset includes 113 recordings (89 for training and 24 for testing), featuring interactions with 52 healthy individuals and 61 diagnosed with depression. Each participant underwent assessments using the Montgomery-Asberg Depression Rating Scale (MADRS) in Chinese, with diagnoses confirmed by psychiatry specialists.

These interviews were meticulously audio-recorded, transcribed, and annotated by experienced physicians, ensuring data quality. Participants are tasked with developing machine learning models to detect depression presence and predict severity levels using audio and text features extracted from interviews.

Join us in leveraging this groundbreaking dataset to revolutionize depression diagnosis and advance mental health care. Let's make a difference together!

Instructions:

Dataset Description

editEdit

Data Files

train.zip - Contains 89 clinical interview audio recordings in MP3 format.
train_json.zip - Provides transcriptions for the corresponding audio recordings.
test.zip - Contains 24 clinical interview audio recordings in MP3 format.
test_json.zip - Provides transcriptions for the corresponding audio recordings.
train.csv - Metadata for the training set, including audio filenames, participant gender, and age.
test.csv - Metadata for the test set, including participant IDs, gender, and age.
sample_submission.csv - A template for participants to submit their predictions in the correct format.

Columns

Participant - Participant ID
File_name - File name of interview recording
Gender - Gender
Age - Age

Transcripts

Transcripts are available in JSON format, containing information such as start and end times, speaker identification, and word-level details. Each transcript entry includes the background, end, one-best transcription, speaker ID, and a list of word results.
Example transcript entry:

{
 "data": [
   {
     "bg": "240",
     "ed": "1160",
     "onebest": "都去找，",
     "si": "0",
     "speaker": "1",
     "wordsResultList": [
       {
         "alternativeList": [],
         "wc": "1.0000",
         "wordBg": "4",
         "wordEd": "38",
         "wordsName": "都",
         "wp": "n"
       },
       {
         "alternativeList": [],
         "wc": "1.0000",
         "wordBg": "39",
         "wordEd": "55",
         "wordsName": "去",
         "wp": "n"
       },
       {
         "alternativeList": [],
         "wc": "1.0000",
         "wordBg": "56",
         "wordEd": "83",
         "wordsName": "找",
         "wp": "n"
       },
       {
         "alternativeList": [],
         "wc": "0.0000",
         "wordBg": "83",
         "wordEd": "83",
         "wordsName": "，",
         "wp": "p"
       }
     ]
   },
   {
     "bg": "1600",
     "ed": "3450",
     "onebest": "嗯没有紧张。",
     "si": "0",
     "speaker": "1",
     "wordsResultList": [
       {
         "alternativeList": [],
         "wc": "1.0000",
         "wordBg": "32",
         "wordEd": "44",
         "wordsName": "嗯",
         "wp": "s"
       },
       {
         "alternativeList": [],
         "wc": "1.0000",
         "wordBg": "45",
         "wordEd": "109",
         "wordsName": "没有",
         "wp": "n"
       }
     ]
   }
 ]
}

For showcasing the parsing of JSON files, please refer to the code snippet below:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat May 23 21:36:02 2020
@author: Kaining
"""
import logging
import os
import json
import ast
import csv
from contextlib import closing
'''
Load the *.json format iFlyTek result
'''
def load_json(file_path):
   with closing(open(file_path, "rb")) as file:
       data_dict = json.load(file, encoding='utf-8')
   # print(data_dict.keys())
   data_dict = ast.literal_eval(str(data_dict['data']))
   subject_id = file_path.split('/')[-1].split('.')[0]
   return data_dict, subject_id
'''
Write into *.csv file
'''
mapping = {'1': 'Doctor', '2': 'Patient'}
def write_to_csv(dest_path, data_dict, subject_id):
   data_to_write = []
   tmp = []
   for key in data_dict[0].keys():
       if key == "si":
           continue
       tmp.append(key)
   data_to_write.append(tmp)
   for i in range(len(data_dict)):
       tmp = []
       for key in data_dict[i].keys():
           if key == "si":
               continue
           if key == "speaker" and data_dict[i][key] == "1":
               if i == 0:
                   mapping['2'] = "Doctor"
                   mapping['1'] = "Patient"
                   tmp.append(mapping[data_dict[i][key]])
               else:
                   tmp.append(mapping[data_dict[i][key]])
           elif key == "speaker" and data_dict[i][key] == "2":
               if i == 0:
                   mapping['1'] = "Doctor"
                   mapping['2'] = "Patient"
                   tmp.append(mapping[data_dict[i][key]])
               else:
                   tmp.append(mapping[data_dict[i][key]])
           elif key == "wordsResultList":
               tmp_2 = []
               for item in data_dict[i]["wordsResultList"]:
                   tmp_2.append(item["wordsName"].replace('\n', '').replace('\r', ''))
               tmp.append(tmp_2)
           else:
               tmp.append(data_dict[i][key])
       data_to_write.append(tmp)
   with closing(open(dest_path, 'w', encoding='utf-8')) as file:
       writer = csv.writer(file, delimiter=',')
       writer.writerows(data_to_write)
       # print("Finished writing")
       logging.info("Finished writing")