COVID vaccine and deaths from Delta variant

Application exercise

The main question we’ll explore today is “How do deaths from COVID cases compare between vaccinated and unvaccinated?”

What do you think?

Goals

  • Creating data visualizations and calculating summary statistics for comparing trends across groups

  • Distinguishing observational studies and experiments

  • Reviewing various sampling methods

  • Identifying confounding variables and Simpson’s paradox

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data

The data for this case study come from a technical briefing published by Public Health England in August 2021 on COVID cases, vaccinations, and deaths from the Delta variant.

First, let’s load the data:

delta <- read_csv("data/delta.csv")
Rows: 268166 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): vaccine, age, outcome

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Visualizing and summarizing categorical data

Exercise 1

How many rows and columns are in this dataset? Answer in a full sentence using inline code. What does each row represent and what does each column represent? For each variable, identify its type.

Add your answer here.

Exercise 2

Do these data come from an observational study or experiment? Why?

Add your answer here.

Exercise 3

Create a visualization of health outcome by vaccine status that allows you to compare the proportion of deaths across those who are and are not vaccinated. What can you say about death rates in these two groups based on this visualization?

Add your answer here.

# add code here

Exercise 4

Calculate the proportion of deaths in among those who are vaccinated. Then, calculate the proportion among those who are not vaccinated.

Add your answer here.

# add code here

Exercise 5

Create the visualization and calculate proportions from the two previous exercises, this time controlling for age. How do the proportions compare?

Add your answer here.

# add code here

Exercise 6

Based on your findings so far, fill in the blanks with more, less, or equally: Is there anything surprising about these statements? Speculate on what, if anything, the discrepancy might be due to.

  • In 2021, among those in the UK who were COVID Delta cases, the vaccinated were ___ likely to die than the unvaccinated.

  • For those under 50, those who were unvaccinated were ___ likely to die than those who were vaccinated.

  • For those 50 and up, those who were unvaccinated were ___ likely to die than those who were vaccinated.

Add your answer here.

Simpson’s Paradox

Simpson’s paradox is a phenomenon in which a trend appears in subsets of the data, but disappears or reverses when the subsets are combined. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the analysis.

Exercise 7

Let’s rephrase the previous question which asked you to speculate on why deaths among vaccinated cases overall is higher while deaths among unvaccinated cases are higher when we split the data into two groups (below 50 and 50 and up). What might be the confounding variable in the relationship between vaccination and deaths?

Add your answer here.

Exercise 8

Visualize and describe the distribution of seniors (50 and up) based on (a.k.a. conditional on) vaccination status. Hint: Your description will benefit from calculating proportions of seniors in each of the vaccination groups and working those values into your narrative.

Add your answer here.

# add your code here

Summary

The percentages of deaths from COVID across vaccination groups is as follows:

# add code here

Also considering age groups, the death rates are as follows:

# add code here

We can pivot these data for better display; we’ll learn more about these “data moves” soon:

# add code here

We identified age as a potential confounding variable in the relationship between. So let’s take a look at the distribution of age in the data:

# add code here

And then, let’s use these proportions to weigh the percentages of deaths.

# add code here

Revisiting the question we posed to start with: How do deaths from COVID cases compare between vaccinated and unvaccinated?

Acknowledgements

This case study is inspired by Statistical Literacy: Simpson’s Paradox and Covid Deaths by Milo Schield.