library(tidyverse)
library(tidymodels)
library(openintro)
Donors and yawners
In this application exercise, we’ll introduce pipelines for conducting hypothesis tests with randomization.
Goals
Conduct a hypothesis test for a proportion
Conduct a hypothesis test for a difference in proportions
Packages and data
We’ll use the tidyverse and tidymodels packages as usual and the openintro package for the datasets.
Case study 1: Donors
The first dataset we’ll use is organ_donors
, which is in your data
folder:
<- read_csv("data/organ-donor.csv") organ_donor
Rows: 62 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): outcome
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The hypotheses we are testing are:
\(H_0: p = 0.10\)
\(H_A: p \ne 0.10\)
where \(p\) is the true complication rate for this consultant.
Exercise 1
Construct the null distribution with 100 resamples. Name it null_dist_donor
. How many rows does null_dist_donor
have? How many columns? What does each row and each column represent?
# add code here
Add response here.
Exercise 2
Where do you expect the center of the null distribution to be? Visualize it to confirm.
# option 1
# add code here
# option 2
# add code here
Exercise 2
Calculate the observed complication rate of this consultant. Name it obs_stat_donor
.
# add code here
Exercise 3
Overlay the observed statistic on the null distribution and comment on whether an observed outcome as extreme as the observed statistic, or lower, is a likely or unlikely outcome, if in fact the null hypothesis is true.
# option 1
# add code here
# option 2
# add code here
Exercise 4
Calculate the p-value and comment on whether it provides convincing evidence that this consultant incurs a lower complication rate than 10% (overall US complication rate).
Add response here.
# option 1
# add code here
# option 2
# add code here
Exercise 5
Let’s get real! Redo the test with 15,000 simulations. Note: This can take some time to run.
# add code here
Case study 2: Yawners
Exercise 6
Using the yawn
dataset in the openintro package, conduct a hypothesis test for evaluating whether yawning is contagious. First, set the hypotheses. Then, conduct a randomization test using 1000 simulations. Visualize and calculate the p-value and use it to make a conclusion about the statistical discernability of the difference in proportions of yawners in the two groups. Then, comment on whether you “buy” this conclusion.
Add response here.
# add code here