Datasets
Standard Dataset
Enhanced Cardiovascular Disease Dataset with Data Augmentation
- Citation Author(s):
- Submitted by:
- Jose Lopez Saynes
- Last updated:
- Mon, 03/03/2025 - 15:18
- DOI:
- 10.21227/v8bh-y702
- Data Format:
- Research Article Link:
- Links:
- License:
- Categories:
- Keywords:
Abstract
This dataset comprises 2 million synthetic samples generated using the Variational Autoencoder-Generative Adversarial Network (VAE-GAN) technique. The dataset is designed to facilitate cardiovascular disease prediction through various demographic, physical, and health-related attributes. It contains essential physiological and behavioral indicators that contribute to cardiovascular health.
Dataset Description The dataset consists of the following features:
-
Age (int, days): The age of the individual.
-
Height (int, cm): The height of the individual in centimeters.
-
Weight (float, kg): The weight of the individual in kilograms.
-
Body Mass Index (BMI) (float): Calculated as , providing an indicator of body fat.
-
Gender (categorical code): Encoded as 1 for female and 2 for male.
-
Systolic Blood Pressure (ap_hi) (int): The maximum arterial pressure during heartbeats.
-
Diastolic Blood Pressure (ap_lo) (int): The minimum arterial pressure between heartbeats.
-
Cholesterol (categorical): 1 for normal, 2 for above normal, and 3 for well above normal levels.
-
Glucose (categorical): 1 for normal, 2 for above normal, and 3 for well above normal levels.
-
Smoking (binary): 1 if the individual smokes, 0 otherwise.
-
Alcohol Intake (binary): 1 if the individual consumes alcohol, 0 otherwise.
-
Physical Activity (binary): 1 if the individual engages in regular physical activity, 0 otherwise.
Target Variable
-
Cardiovascular Disease (cardio) (binary): The presence (1) or absence (0) of cardiovascular disease.
This dataset provides a comprehensive set of features that can be used for machine learning models in cardiovascular disease prediction, enabling research and analysis on health-related risk factors and prevention strategies.
Instructions for Using the Dataset
-
Download the Dataset
- Download the dataset file from IEEE DataPort.
-
Install Required Libraries
- Ensure that you have the necessary libraries installed, such as
pandas
andnumpy
.
- Ensure that you have the necessary libraries installed, such as
-
Load the Dataset in Python
- Once the file is downloaded, load it into your Python environment using an appropriate tool (e.g.,
pandas
for.csv
files).
- Once the file is downloaded, load it into your Python environment using an appropriate tool (e.g.,
-
Explore the Dataset
- Review the dataset to understand the columns and the types of data it contains.
-
Handle Missing Data
- If there are missing values in the dataset, you can choose to remove rows with missing data or fill them with appropriate values.
-
Select Relevant Features
- Select the columns or features that are important for your analysis or modeling.
-
Prepare for Analysis
- Prepare the data, ensuring that the variables are in the correct format for analysis or modeling.
-
Save the Processed Dataset
- After making modifications or cleaning the data, save the processed dataset into a new file for future use.
Documentation
Attachment | Size |
---|---|
1.65 KB |