Logistic regression

Lecture 11

Dr. Mine Çetinkaya-Rundel

Duke University
STA 101 - Fall 2023

Warm up

Project

Any questions about projects?

From last time

Application exercise

Go to Posit Cloud and continue the project titled ae-09-Spam.

Logistic regression

The model

Logit form:

\[ log(\frac{p}{1 - p}) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... \]

Probability form:

\[ p = \frac{e^{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...}}{1 + e^{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...}} \]

Data + packages

library(tidyverse)
library(tidymodels)

hp_spam <- read_csv("data/hp-spam.csv") |>
  mutate(type = as.factor(type))

hp_spam |>
  select(type, you, capitalTotal)
# A tibble: 4,601 × 3
   type    you capitalTotal
   <fct> <dbl>        <dbl>
 1 1      1.93          278
 2 1      3.47         1028
 3 1      1.36         2259
 4 1      3.18          191
 5 1      3.18          191
 6 1      0              54
 7 1      3.85          112
 8 1      0              49
 9 1      1.23         1257
10 1      1.67          749
# ℹ 4,591 more rows

Fit model and display output

spam_fit <- logistic_reg() |>
  fit(type ~ you + capitalTotal, data = hp_spam)
  
tidy(spam_fit)
# A tibble: 3 × 5
  term         estimate std.error statistic   p.value
  <chr>           <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)  -1.50     0.0554       -27.1 2.97e-162
2 you           0.361    0.0198        18.3 1.84e- 74
3 capitalTotal  0.00173  0.000104      16.6 5.66e- 62

The model

# A tibble: 3 × 5
  term         estimate std.error statistic   p.value
  <chr>           <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)  -1.50     0.0554       -27.1 2.97e-162
2 you           0.361    0.0198        18.3 1.84e- 74
3 capitalTotal  0.00173  0.000104      16.6 5.66e- 62

\[ log(\frac{p}{1 - p}) = -1.50 + 0.361 \times you + 0.00173 \times capitalTotal + \epsilon \]

\[ log(\frac{\hat{p}}{1 - \hat{p}}) = -1.50 + 0.361 \times you + 0.00173 \times capitalTotal \]

where \(p\) is the probability

Exam 1

Exam 1