Birth weights

Application exercise

In this application exercise, we’ll do inference for a comparing two means, using simulation-based and mathematical models.

Packages

We’ll use the tidyverse, tidymodels, openintro, and ggthemes packages.

library(tidyverse)
library(tidymodels)
library(openintro)
library(ggthemes)

Data

Every year, the United States Department of Health and Human Services releases to the public a large dataset containing information on births recorded in the country. This dataset has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. In this case study we work with a random sample of 1,000 cases from the dataset released in 2014. The length of pregnancy, measured in weeks, is commonly referred to as gestation.

glimpse(births14)
Rows: 1,000
Columns: 13
$ fage           <int> 34, 36, 37, NA, 32, 32, 37, 29, 30, 29, 30, 34, 28, 28,…
$ mage           <dbl> 34, 31, 36, 16, 31, 26, 36, 24, 32, 26, 34, 27, 22, 31,…
$ mature         <chr> "younger mom", "younger mom", "mature mom", "younger mo…
$ weeks          <dbl> 37, 41, 37, 38, 36, 39, 36, 40, 39, 39, 42, 40, 40, 39,…
$ premie         <chr> "full term", "full term", "full term", "full term", "pr…
$ visits         <dbl> 14, 12, 10, NA, 12, 14, 10, 13, 15, 11, 14, 16, 20, 15,…
$ gained         <dbl> 28, 41, 28, 29, 48, 45, 20, 65, 25, 22, 40, 30, 31, NA,…
$ weight         <dbl> 6.96, 8.86, 7.51, 6.19, 6.75, 6.69, 6.13, 6.74, 8.94, 9…
$ lowbirthweight <chr> "not low", "not low", "not low", "not low", "not low", …
$ sex            <chr> "male", "female", "female", "male", "female", "female",…
$ habit          <chr> "nonsmoker", "nonsmoker", "nonsmoker", "nonsmoker", "no…
$ marital        <chr> "married", "married", "married", "not married", "marrie…
$ whitemom       <chr> "white", "white", "not white", "white", "white", "white…

Note that there are some NAs in the habit variable.

births14 |>
  count(habit)
# A tibble: 3 × 2
  habit         n
  <chr>     <int>
1 nonsmoker   867
2 smoker      114
3 <NA>         19

Let’s drop those since we can’t use those observations in our analysis.

births14 <- births14 |>
  drop_na(habit)

Randomization test for comparing two means

Exercise 1

Set the hypotheses for testing if there is a difference between mean birth weight of babies born to mothers who are smokers and those born to mothers who are not smokers.

Add answer here.

Exercise 2

Calculate the observed difference between the mean birth weight of babies born to mothers who are smokers and those born to mothers who are not smokers in this sample.

# add code here

Exercise 3

Suppose the birth weights of the babies in this sample are written on pieces of paper. Explain how you would conduct the randomization test tactically. Hint: You may need to calculate some summary statistics first.

# add code here

Add answer here.

Exercise 4

Construct and visualize the null distribution. Based on your visualization, speculate on whether the p-value will be small or large.

# add code here

Add answer here.

Exercise 5

Visualize and calculate the p-value. At the 5% discernability level, what is the conclusion of the hypothesis test?

# add code here

Add answer here.

Bootstrap interval for the difference of two means

Exercise 6

Construct and interpret confidence interval at the equivalent level as the previous hypothesis test for the difference between the mean weight of babies born to mothers who are smokers and those who are bor to mothers who are not smokers.

# add code here

Add response here.

Inference using mathematical models for comparing two means

Exercise 7

Check that the technical conditions for conducting inference using mathematical models for comparing two means are met for these data.

Add response here.

Exercise 8

Conduct a hypothesis test using mathematical models for the hypotheses you set in Section 3.1.

# add code here

Improving visualizations

Exercise 9

Improve the density plot from the lecture slides.

# add code here