HLS-CMDS: Heart and Lung Sounds Dataset Recorded from a Clinical Manikin using Digital Stethoscope

- Citation Author(s):
-
Shahram ShiraniJames P. Reilly
- Submitted by:
- Yasaman Torabi
- Last updated:
- DOI:
- 10.21227/fe0m-k110
- Data Format:
- Categories:
- Keywords:
Abstract
This dataset contains 535 recordings of heart and lung sounds captured using a digital stethoscope from a clinical manikin, including both individual and mixed recordings of heart and lung sounds; 50 heart sounds, 50 lung sounds, and 145 mixed sounds. For each mixed sound, the corresponding source heart sound (145 recordings) and source lung sound (145 recordings) were also recorded. It includes recordings from different anatomical chest locations, with normal and abnormal sounds. Each recording has been filtered to highlight specific sound types, making it valuable for artificial intelligence (AI) research and applications. If you use this dataset, please cite: Y. Torabi, S. Shirani, and J. P. Reilly, “Descriptor: Heart and Lung Sounds Dataset Recorded from a Clinical Manikin using Digital Stethoscope (HLS-CMDS),” in IEEE Data Descriptions, https://doi.org/10.1109/IEEEDATA.2025.3566012.
Instructions:
This dataset consists of 535 .wav audio files and accompanying CSV metadata files. If you use this dataset, please cite:
Y. Torabi, S. Shirani, and J. P. Reilly, “Descriptor: Heart and Lung Sounds Dataset Recorded from a Clinical Manikin using Digital Stethoscope (HLS-CMDS),” in IEEE Data Descriptions, https://doi.org/10.1109/IEEEDATA.2025.3566012.
To use the dataset:
1. Download and extract the provided zip file.
2. Refer to the README file for detailed descriptions of the folder structure and file naming conventions.
3. Use the CSV metadata files (Hs.csv, Ls.csv, Mix.csv) to match audio files with gender, sound type, auscultation location, and sound IDs.
4. Each .wav file is a 15-second, 22,050 Hz sampled recording, ready for input into audio analysis pipelines.
5. Researchers can use the dataset for supervised or unsupervised learning, source separation, or cardiopulmonary classification tasks.