library(tidyverse)
library(tidymodels)
library(openintro)
library(ggthemes)
Birth weights
In this application exercise, we’ll do inference for a comparing two means, using simulation-based and mathematical models.
Packages
We’ll use the tidyverse, tidymodels, openintro, and ggthemes packages.
Data
Every year, the United States Department of Health and Human Services releases to the public a large dataset containing information on births recorded in the country. This dataset has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. In this case study we work with a random sample of 1,000 cases from the dataset released in 2014. The length of pregnancy, measured in weeks, is commonly referred to as gestation.
glimpse(births14)
Rows: 1,000
Columns: 13
$ fage <int> 34, 36, 37, NA, 32, 32, 37, 29, 30, 29, 30, 34, 28, 28,…
$ mage <dbl> 34, 31, 36, 16, 31, 26, 36, 24, 32, 26, 34, 27, 22, 31,…
$ mature <chr> "younger mom", "younger mom", "mature mom", "younger mo…
$ weeks <dbl> 37, 41, 37, 38, 36, 39, 36, 40, 39, 39, 42, 40, 40, 39,…
$ premie <chr> "full term", "full term", "full term", "full term", "pr…
$ visits <dbl> 14, 12, 10, NA, 12, 14, 10, 13, 15, 11, 14, 16, 20, 15,…
$ gained <dbl> 28, 41, 28, 29, 48, 45, 20, 65, 25, 22, 40, 30, 31, NA,…
$ weight <dbl> 6.96, 8.86, 7.51, 6.19, 6.75, 6.69, 6.13, 6.74, 8.94, 9…
$ lowbirthweight <chr> "not low", "not low", "not low", "not low", "not low", …
$ sex <chr> "male", "female", "female", "male", "female", "female",…
$ habit <chr> "nonsmoker", "nonsmoker", "nonsmoker", "nonsmoker", "no…
$ marital <chr> "married", "married", "married", "not married", "marrie…
$ whitemom <chr> "white", "white", "not white", "white", "white", "white…
Note that there are some NA
s in the habit
variable.
|>
births14 count(habit)
# A tibble: 3 × 2
habit n
<chr> <int>
1 nonsmoker 867
2 smoker 114
3 <NA> 19
Let’s drop those since we can’t use those observations in our analysis.
<- births14 |>
births14 drop_na(habit)
Randomization test for comparing two means
Exercise 1
Set the hypotheses for testing if there is a difference between mean birth weight of babies born to mothers who are smokers and those born to mothers who are not smokers.
Add answer here.
Exercise 2
Calculate the observed difference between the mean birth weight of babies born to mothers who are smokers and those born to mothers who are not smokers in this sample.
# add code here
Exercise 3
Suppose the birth weights of the babies in this sample are written on pieces of paper. Explain how you would conduct the randomization test tactically. Hint: You may need to calculate some summary statistics first.
# add code here
Add answer here.
Exercise 4
Construct and visualize the null distribution. Based on your visualization, speculate on whether the p-value will be small or large.
# add code here
Add answer here.
Exercise 5
Visualize and calculate the p-value. At the 5% discernability level, what is the conclusion of the hypothesis test?
# add code here
Add answer here.
Bootstrap interval for the difference of two means
Exercise 6
Construct and interpret confidence interval at the equivalent level as the previous hypothesis test for the difference between the mean weight of babies born to mothers who are smokers and those who are bor to mothers who are not smokers.
# add code here
Add response here.
Inference using mathematical models for comparing two means
Exercise 7
Check that the technical conditions for conducting inference using mathematical models for comparing two means are met for these data.
Add response here.
Exercise 8
Conduct a hypothesis test using mathematical models for the hypotheses you set in Section 3.1.
# add code here
Improving visualizations
Exercise 9
Improve the density plot from the lecture slides.
# add code here