This dataset is designed for the classification of textual transcriptions of spoken conversations in Shanghai dialect and Mandarin Chinese. It consists of high-quality, manually transcribed texts from natural dialogues, annotated with corresponding language labels (Shanghai dialect: 1, Mandarin: 0).