Yida Bao

Shanghai Dialect and Mandarin

This dataset is designed for the classification of textual transcriptions of spoken conversations in Shanghai dialect and Mandarin Chinese. It consists of high-quality, manually transcribed texts from natural dialogues, annotated with corresponding language labels (Shanghai dialect: 1, Mandarin: 0). The dataset aims to facilitate research in text-based dialect classification, natural language processing (NLP), and linguistic variation analysis.

Categories:

Artificial Intelligence
Machine Learning

Dataset Entries from this Author

Shanghai Dialect and Mandarin