Cause of Death in the United States

Centers for Disease Control and Prevention
Alex Outman
Thu, 11/08/2018 - 10:34
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for the year 2014, which includes detailed information about causes of death and the demographic background of the deceased.
It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.
Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world.


This dataset is a collection of tables and is available in both CSV and SQLite formats. It was reformatted from its original fixed-width DUSMCPUB file format into a relational structure.
Each row in the DeathRecords table is an individual death record. Each death record has a one-to-many relationship with the EntityAxisConditions and RecordAxisConditions tables via a DeathRecordId key. Both of these conditions tables contain ICD-10 codes that indicate cause of death for each person. The difference between the tables is that the EntityAxisConditions are a sequential list of causes (as indicated on their death certificate), whereas the RecordAxisConditions is a set of unordered causes.