Regression overview

Lecture 9

Dr. Mine Çetinkaya-Rundel

Duke University
STA 101 - Fall 2023

Warm up

Check-in

  • Daily check-ins for getting you thinking at the beginning of the class and reviewing recent/important concepts
  • Go to Canvas and open today’s “quiz” titled 2023-10-02 Check-in
  • Access code: ___ (released in class)
  • “Graded” for completion

Exam 1

  • Classroom: You should have received your exam location assignment this morning. Reach out to me ASAP if you haven’t received it.
  • In-class: On Wednesday
    • 1 page, both sides notes (“cheat sheet”)
    • Nothing else
  • Take-home: Wednesday - Friday
    • Open everything except for other humans, except for me :)

Class check-ins so far

# load packages
library(tidyverse)
library(tidytext)
library(wordcloud)

# set a seed for reproducible random number generation
# used in placement of words in word clouds
set.seed(102)

Confounding variable

A confounding variable is a third variable that influences both the independent and dependent variables.

checkin_9_11 <- read_csv("data/2023-09-11 Check-in Survey Student Analysis Report.csv") |>
  select(contains("Define"))
names(checkin_9_11) <- "definition"

checkin_9_11 |>
  unnest_tokens(word, definition) |>
  anti_join(stop_words) |>
  filter(!(word %in% c("confounding", "variable", "variables"))) |>
  count(word) |>
  with(wordcloud(word, n, max.words = 100))  

Ridge plot

A ridge plot is type of data visualization that plots density curves for various groups on the same scale in a single plot. It’s used for comparing the distribution of a numerical variable across the levels of a categorical variable, as an alternative to side-by-side box plots.

checkin_9_13 <- read_csv("data/2023-09-13 Check-in Survey Student Analysis Report.csv") |>
  select(contains("Define"))
names(checkin_9_13) <- "definition"

checkin_9_13 |>
  unnest_tokens(word, definition) |>
  anti_join(stop_words) |>
  filter(!(word %in% c("ridge", "plot", "plots"))) |>
  count(word) |>
  with(wordcloud(word, n, max.words = 100)) 

Influential point

An influential point is a point with high leverage (away from the rest of the points in the x-direction) that influences the slope of the least squares line, i.e., if this point is removed, the slope of the line would change discernebly.

checkin_9_25 <- read_csv("data/2023-09-25 Check-in Survey Student Analysis Report.csv") |>
  select(contains("Define"))
names(checkin_9_25) <- "definition"

checkin_9_25 |>
  unnest_tokens(word, definition) |>
  anti_join(stop_words) |>
  filter(!(word %in% c("influential"))) |>
  count(word) |>
  with(wordcloud(word, n, max.words = 100))

Parsimonious

A parsimonious model is one that omits predictors that are found to be less important according to various model selection criteria, e.g., adjusted \(R^2\). It’s the simplest, most predictive model.

checkin_9_27 <- read_csv("data/2023-09-27 Check-in Survey Student Analysis Report.csv") |>
  select(contains("Define"))
names(checkin_9_27) <- "definition"

checkin_9_27 |>
  unnest_tokens(word, definition) |>
  anti_join(stop_words) |>
  filter(!(word %in% c("parsimonious"))) |>
  count(word) |>
  with(wordcloud(word, n, max.words = 100))

Parsimonious / money?!

checkin_9_27 |>
  filter(str_detect(definition, "money")) |>
  pull(definition)
[1] "It means unwilling to spend money or resources."                                                                                                                                                                                                            
[2] "unwilling to spend money, stingy"                                                                                                                                                                                                                           
[3] "It means to be unwilling in spending money."                                                                                                                                                                                                                
[4] "\"Parsimonious\" is an adjective that describes an individual or behavior characterized by extreme frugality, unwillingness to spend unnecessary money or resources, or simplicity in the use of resources. It implies being economical and avoiding waste."
[5] "unwilling to spend money"                                                                                                                                                                                                                                   

From last time

Application exercise

Go to Posit Cloud and continue the project titled ae-08-Tips.

ICYMI

Today’s daily check-in access code: ___ (released in class)

Q&A

Questions from anything so far?