Lab 6

By the end of this lab you will construct and interpret confidence intervals via bootstrapping.

For all visualizations you create, be sure to include informative titles for the plot, axes, and legend!

Getting started

  1. Go to Posit Cloud and start the project titled lab-6 - Confidence intervals.

  2. Under the Files tab on the lower right, click on lab-6.qmd to open the lab template.

  3. Complete the exercises in this document.

Part 1: Lemurs

This week we’ll be working with data from the Duke Lemur Center, which houses over 200 lemurs across 14 species – the most diverse population of lemurs on Earth, outside their native Madagascar.

Lemurs are the most threatened group of mammals on the planet, and 95% of lemur species are at risk of extinction. Our mission is to learn everything we can about lemurs – because the more we learn, the better we can work to save them from extinction. They are endemic only to Madagascar, so it’s essentially a one-shot deal: once lemurs are gone from Madagascar, they are gone from the wild.

By studying the variables that most affect their health, reproduction, and social dynamics, the Duke Lemur Center learns how to most effectively focus their conservation efforts. And the more we learn about lemurs, the better we can educate the public around the world about just how amazing these animals are, why they need to be protected, and how each and every one of us can make a difference in their survival.

Source: TidyTuesday

The codebook for the deataset can be found here. While the TidyTuesday project used the full dataset, we’ll be working with a subset, which can be found in the data folder.

First let’s load the packages.

library(tidyverse)
library(tidymodels)

Exercise 1

Load the lemurs data from your data folder and save it as lemurs. Then, report which “types” of lemurs are represented in the sample and how many of each. Note that this information is in the taxon variable. You should refer back to the linked data dictionary to understand what the different values of taxon mean.

Pause and render

Now that you’ve completed your first exercise, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Exercise 2

What is the mean weight of ring-tailed lemurs? Calculate, visualize, and interpret a 95% bootstrap confidence interval together with your point estimate. Use set.seed(123) and reps = 1000 to create your bootstrap confidence interval.

Hints:

Below is a step-by-step recipe for constructing and visualizing a confidence interval. The code snippets shown are not “complete”, they’re intended to guide you in the right direction.

  • Step 1: Create a dataset of ring-tailed lemurs, and call it lemurs_rt. Note, these are taxon == "LCAT".

  • Step 2: Calculate the mean weight of ring-tailed lemurs.

obs_stat_mean_rt <- lemurs_rt |>
  specify(response = weight_g) |>
  calculate(stat = "mean")
  • Step 3: Construct a bootstrap distribution. Note: There is a generate() step, so you also need to set a seed before running the following.
boot_dist_mean_rt <- lemurs_rt |>
  specify(response = weight_g) |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "mean")
  • Step 4: Calculate the bounds of the confidence interval. Don’t forget to interpret it as well!
ci_95_mean_rt <- boot_dist_mean_rt |>
  get_confidence_interval(
    point_estimate = obs_stat_mean_rt, 
    level = 0.95
  )
  • Step 5: Visualize the confidence interval, overlaying it on the bootstrap distribution.
visualize(boot_dist_mean_rt) +
  shade_confidence_interval(endpoints = ci_95_mean_rt)

Exercise 3

What is the median weight of ring-tailed lemurs? What is the median weight of mongoose lemurs? Report a 95% bootstrap confidence interval together with your point estimates. Use set.seed(123) and reps = 10000 to create your bootstrap confidence intervals.

Exercise 4

Your friend has never taken a statistics course. Describe to your friend the process of “bootstrap sampling” in the previous exercise to create a bootstrap confidence interval in your own words. Walk your friend through the process of collecting a bootstrap sample, computing a statistic and then repeating). You may find it helpful to describe it as drawing pieces of paper from a bag like we did in class.

Hint: What would you write on the slips of paper? How many pieces of paper would be in the bag(s)? How many bag(s) would you need for the above exercise?

Exercise 5

Do female lemurs differ in weight from male lemurs? Report the difference in mean weights between groups. Next, construct a 99% bootstrap distribution for the difference in means between groups and interpret it. If the interval covers 0, you might think there is no difference between groups. Does the interval cover 0? Use set.seed(123) and reps = 10000.

Exercise 6

Create a new column that tells you whether or not a lemur was born in the first or second half of the year (January through June vs July through December). Create a meaningful plot to illustrate the weights of each group of lemurs. Use theme_bw() to replace the default gray plot background. Based on your figure, which group of lemurs weighs more? Is the distribution of one or both groups skewed or symmetric? See here for more information about skew vs symmetric distributions.

Exercise 7

Is there more variability in weight for lemurs born in the first half or second half of the year? Report the estimated standard deviation of each group. Report a 90% bootstrap confidence interval of s.d. for each group. Interpret the confidence intervals in context. Use set.seed(123) and remember to set reps = 10000.

Pause and render

Now that you’ve completed a few more exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Part 2: IMS Exercises

The exercises in this section do not require code. Make sure to answer the questions in full sentences.

Exercise 8

IMS - Chapter 12 exercises, #2: Chronic illness.

Exercise 9

IMS - Chapter 12 exercises, #6: Bootstrap distributions of \(\hat{p}\), III.

Exercise 10

IMS - Chapter 12 exercises, #8: Waiting at an ER.

Wrap up

Submitting

Important

Before you proceed, first, make sure that you have updated the document YAML with your name! Then, render your document one last time, for good measure.

To submit your assignment to Gradescope:

  • Go to your Files pane and check the box next to the PDF output of your document (lab-6.pdf).

  • Then, in the Files pane, go to More > Export. This will download the PDF file to your computer. Save it somewhere you can easily locate, e.g., your Downloads folder or your Desktop.

  • Go to the course Canvas page and click on Gradescope and then click on the assignment. You’ll be prompted to submit it.

  • Mark the pages associated with each exercise. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).

Warning

If you fail to mark the pages associated with an exercise, that exercise won’t be graded. This means, if you fail to mark the pages for all exercises, you will receive a 0 on the assignment. The TAs can’t mark your pages for you, and for them to be able to grade, you must mark them.

Grading

Exercise Points
Exercise 1 3
Exercise 2 6
Exercise 3 7
Exercise 4 5
Exercise 5 5
Exercise 6 7
Exercise 7 8
Exercise 8 3
Exercise 9 3
Exercise 10 3
Total 50