Age-stratified Covid-19 case fatality rates (CFRs): different countries and longitudinal
We point out an instantiation of Simpson's paradox in Covid-19 case fatality rates (CFRs): comparing data of 44,672 cases from China with early reports from Italy (9th March), we find that CFRs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and mortality. As a case study, we then investigate total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age, and thus facilitates a more transparent comparison of CFRs across countries throughout the evolution of the Covid-19 pandemic.
This repository contains:
- age-stratified Covid-19 case and fatality data for different countries and at different points in time, and
- an interactive Jupyter notebook for mediation analysis of age-related causal effects on case fatality rates,
published as part of the following paper:
"Simpson's paradox in Covid-19 case fatality rates: a mediation analysis of age-related causal effects". J von Kügelgen*, L Gresele*, B Schölkopf. (*equal contribution). https://arxiv.org/abs/2005.07180
We provide the following three separate datasets:
- a dataset containing only the most recent numbers from: Argentina, China, Colombia, Italy, Netherlands, Portugal, South Africa, Spain, Sweden, Switzerland, South Korea and the Diamond Princess cruise ship (last checked: end of May 2020)
- a longitudinal dataset containing several reports from Italy (9 March - 26 May 2020)
- a longitudinal dataset containing several reports from Spain (22 March - 29 May 2020)
All numbers of confirmed cases and fatalities are stratified by age into groups of 10 years (0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+), and contain the date and country of reporting, as well as links to the corresponding sources (generally health agenices/ministries, or scientific publications).
Please consult the paper and notebook for further details.
- data_and_code.zip (290.05 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.