Name: SOS-HL-1K
Creator: Guanghui Fu
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Social Sciences, Health

Abstract

We sourced our data by crawling comments from the “Zoufan” blog within the Weibo social platform. Subsequently, a team of qualified psychologists were enlisted to annotate the data. In our study, strict data preprocessing measures were adopted to protect users’ privacy.

SOS-HL-1K (Suicide Risk Classification)

Categories: High risk, Low risk
Number of Samples:
- High risk: 601
- Low risk: 648
Data Split:
- Training set: 999 samples
- Test set: 250 samples
Average Number of Words per Post: 47.79
Labels: Each post is labeled with either 'high risk' or 'low risk'.

Instructions:

If you use this dataset in your research, please cite the following paper:

@misc{qi2023evaluating,

title={Evaluating the Efficacy of Supervised Learning vs Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media},

author={Hongzhi Qi and Qing Zhao and Changwei Song and Wei Zhai and Dan Luo and Shuo Liu and Yi Jing Yu and Fan Wang and Huijing Zou and Bing Xiang Yang and Jianqiang Li and Guanghui Fu},

year={2023},

eprint={2309.03564},

archivePrefix={arXiv},

primaryClass={cs.CL}

}

Funding Agency:

National Natural Science Foundation of China

Grant Number:

72174152, 72304212 and 82071546

Dataset Files

SOS-HL-1K.tsv (119.21 kB)

Datasets

Standard Dataset

SOS-HL-1K

Abstract

More from this Author

SocialCD-3K

Dataset Files

QUESTIONS?

Datasets

Standard Dataset

SOS-HL-1K

Abstract

More from this Author

Dataset Files

Related Datasets

QUESTIONS?