Datasets
Standard Dataset
Results from Study "How good are we at extracting information from graphs of variance and uncertainty?"
- Citation Author(s):
- Submitted by:
- Andrew Hill
- Last updated:
- Wed, 01/08/2025 - 13:28
- DOI:
- 10.21227/vgjq-2712
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Decision-makers must make decisions under uncertainty. In the era of Data Analytics and ``data-driven" decision-making, decision-makers therefore need to understand the risk, variance and uncertainty in the data they are provided. While there has been a sizable research effort to investigate how to communicate risk to lay people better, the field of uncertainty visualization is much less-developed, with many questions still remaining how to best visualize variance and uncertainty.
Therefore, in this study we have investigated how successfully people can extract key metrics (such as averages and percentiles) from common graphs of variance and uncertainty, including histograms, scatterplots, boxplots, violin plots, probability density functions and cumulative density functions.
As expected, we found that participants were able to more commonly correctly extract out averages, minimums and maximums than percentiles or p-values. The scatterplot was the most consistent performer across all questions. However, the overall successful extraction rate was poor, with only around 50% of questions answered correctly. Even with common graphs such as histograms and scatterplots, only around half of participants could extract out basic metrics like the average and maximum values.
Participants especially struggled with interpreting the data for decision-making. Few could synthesize central tendency and variance together to make an optimal choice, or could successfully identify that two completely separated distributions meant that one variable was always higher than the other. In conclusion, it seems that many people struggle with common graphs of variance and uncertainty, and the current graph types are not intuitive. More research is needed to identify better ways to communicate variance and uncertainty to lay people.
The three datasets from this zip file need to be extracted and saved. These CSV files then provide the input files for the R code that conducts the statistical analysis. Data dictionary provides more detail on the data collected.
Dataset Files
- CSV files for use with the R scripts provided. Study Data.zip (46.85 kB)
- Code to run statistical analysis for Experiment 1. Exp1_Analysis.R (14.77 kB)
- Code to run statistical analysis for Experiment 2. Exp2_Analysis.R (7.42 kB)
Documentation
Attachment | Size |
---|---|
Data dictionary | 16.73 KB |