VOCs from blood culture broth samples

Citation Author(s):
Michael
Bastos
Universidade Federal de Pernambuco, Centro de Informática
Leandro
Almeida
Universidade Federal de Pernambuco, Centro de Informática
Clayton
Benevides
Comissão Nacional de Energia Nuclear
Margaret
Powers-Fletcher
University of Cincinnati, College of Medicine
Christina
Cox
University of Cincinnati, College of Medicine
Submitted by:
Michael Bastos
Last updated:
Sat, 04/19/2025 - 12:51
DOI:
10.21227/2341-nm71
Data Format:
License:
3 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset comprises volatile organic compound (VOC) profiles collected from blood culture broth samples using an electronic nose (E-nose) system. The samples include cultures positive for Candida spp., including C. albicans, C. glabrata, C. tropicalis, among others, as well as negative control samples. Each sample was exposed to the E-nose sensor array, which consists of multiple gas sensors sensitive to different VOC families.

The raw sensor responses were recorded over a fixed acquisition period, capturing temporal patterns and intensity variations in VOC emissions. The dataset is intended for the development and evaluation of machine learning and deep learning models for automated fungal infection detection, specifically targeting rapid and non-invasive identification of Candida species in clinical environments.

Data includes:

  • Time-series sensor outputs for each broth sample

  • Label information (e.g., species identification or control)

 

This dataset supports applications in medical diagnostics, artificial olfaction, and biosensor data modeling.

Instructions: 

Dataset Summary: VOC Time Series from Blood Culture Broth – Candida glabrata

Filename: 1_Glabrata_01_17_08_2024.csv
Sample Type: Candida glabrata
Number of Records: 735 (time points)
Data Type: Raw time-series sensor readings from an electronic nose
Sensor Channels: Appears to include multiple gas sensors and environmental sensors (e.g., temperature, pressure, etc.)
File Encoding: Possibly non-standard characters (e.g., "#### New Purge ####") indicating phase changes or session breaks.

Data Description

Each row in the file appears to represent a single time point during the exposure of a Candida glabrata sample to the E-nose. The values include:

  • Sensor outputs (likely raw voltages or resistance values)

  • Environmental data (e.g., temperature, humidity)

  • Label/class at the end of the row (glabrata), indicating the organism in the sample

How to Use the Data

To use this file in AI or ML workflows:

  1. Preprocessing Steps:

    • Remove headers like "#### New Purge ####" if present as section markers.

    • Split each row into separate columns using space or tab delimiters.

    • Assign column names, such as: Time, Sensor1, Sensor2, ..., Temp, Humidity, Label.

    • Normalize sensor readings if necessary.

  2. Recommended Structure (after processing):

    TimeSensor1Sensor2Sensor3Sensor4TempHumidityLabel14861007.01315.0558.371.0127.080.0glabrata........................

  3. AI Use Cases:

    • Classification: Train a model (e.g., LSTM, SVM) to identify Candida species from VOC patterns.

    • Feature extraction: Analyze patterns over time for each sensor.

    • Anomaly detection: Identify non-standard responses or rare species.

  4. Tools Suggested:

    • Pandas for data cleaning and transformation

    • Scikit-learn or Sktime for model development

    • Matplotlib/Seaborn for data visualization

    • TSFresh or Catch22 for time series feature extraction

Additional Notes

 

  • This file seems to be part of a larger dataset with multiple species. Standardizing the structure across files will help in building a robust multi-class classifier.

  • Ensure correct encoding when loading the file to avoid parsing issues due to special characters.

Funding Agency: 
Comissão de aperfeiçoamento de pessoas - CAPES
Grant Number: 
001

Dataset Files

    Files have not been uploaded for this dataset

    Documentation

    AttachmentSize
    File VOC_Dataset_Usage_Guide.pdf52.35 KB