Datasets
Open Access
Heart Disease Dataset (Comprehensive)
- Citation Author(s):
- Submitted by:
- MANU SIDDHARTHA
- Last updated:
- Fri, 11/06/2020 - 04:17
- DOI:
- 10.21227/dz4t-cm36
- Data Format:
- Links:
- License:
54528 Views
- Categories:
- Keywords:
6 ratings - Please login to submit your rating.
Abstract
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:
- Cleveland
- Hungarian
- Switzerland
- Long Beach VA
- Statlog (Heart) Data Set.
This dataset consists of 1190 instances with 11 features. These datasets were collected and combined at one place to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment.
Instructions:
This dataset can be used for building a predictive machine learning model for early-stage heart disease detection
Dataset Files
- heart_statlog_cleveland_hungary_final.csv (38.76 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
documentation.pdf | 410.81 KB |
Comments
ok
1
2
This dataset includes 272 duplicate records, notably all data from statlog is in the original dataset. Also all locations where data was previously missing look like they were simply set to 0. User beware.
Hello, I will work on this database soon. I ask you, if possible, to tell the information you have obtained about the problems of this database in full and cleared.
Your name
Rabia Almamlook
hey Jeremy, how would you suggest to deal with missing data?
Thank you!
Thanks
cholesterol has 172 (14.5%) zeros
ok
azhe
How to deal with the cholestrol column with zeroes in it
Check outliers first, if they're too many, use the median value to replace the zeros, otherwise use the mean value
This dataset includes 272 duplicate records, notably all data from statlog is in the original dataset. Also all locations where data was previously missing look like they were simply set to 0. User beware.