This corpus comprises a diverse collection of authentic dialogues extracted from clinical encounters and from AI-generated interactions. Encompassing a wide array of scenarios, it offers a comprehensive snapshot of human communication within medical contexts and the evolving capabilities of AI. By intertwining genuine exchanges with those produced by AI models, the corpus facilitates a deeper understanding of communication dynamics and the progression of AI technology in simulating human interactions. Researchers and practitioners can leverage this diverse dataset to study communication patterns, refine AI models, and enhance healthcare delivery systems. Through this fusion of real-world dialogues and AI-generated content, the corpus serves as a valuable resource for advancing both medical communication research and AI development.


There are 3760 instances (with two classes) in the Doctor-ChatGPT (DC) dataset. In the Doctor-Rephrased Doctor (DR) dataset, we have 3760 instances with two classes. The Doctor-ChatGPT-Rephrased Doctor (DCR) dataset contains 5640 instances across three classes, combining interactions from both datasets. Responses from the Doctors are classified as class 0, those from ChatGPT as class 1, and rephrased Doctor responses as class 2.

Data Descriptor Article DOI: 


All instances within the dataset are consolidated into this singular file.

Submitted by Olumide Ojo on Wed, 03/20/2024 - 08:38