Lecture 10
Duke University
STA 101 - Fall 2023
2023-10-09 Check-in
slopes
(released in class) Project workday on Friday in lab – use this time for finalizing and polishing, not getting started
Projects due Friday at 5 pm
Peer evaluations due Friday at 6 pm:
You’ll receive an email about them later today from TEAMMATES
You must turn in a peer evaluation to get the points your teammates awarded you
Only one person should submit the project on Gradescope and select all team members’ names. See https://sta101-f23.github.io/project/project-1.html#submission for more information.
Any questions about projects?
Outcome: Numerical, Predictor: One numerical or one categorical with only two levels
Outcome: Numerical, Predictors: Any number of numerical or categorical variables with any number of levels
Outcome: Categorical with only two levels, Predictors: Any number of numerical or categorical variables with any number of levels
Outcome: Categorical with any number of levels, Predictors: Any number of numerical or categorical variables with any number of levels
4601 emails collected at Hewlett-Packard labs and contains 58 variables
Outcome: type
type = 1
is spam
type = 0
is non-spam
Predictors of interest:
capitalTotal
: Number of capital letters in email
Percentages are calculated as (100 * number of times the WORD appears in the e-mail) / total number of words in email
george
: Percentage of “george”s in email (these were George’s emails)
you
: Percentage of “you”s in email
What type of data is type
? What type should it be in order to use logistic regression?
capitalTotal
george
, is that you
?ggplot(hp_spam, aes(x = george)) +
geom_histogram()
ggplot(hp_spam, aes(x = you)) +
geom_histogram()
Logistic regression takes in a number of predictors and outputs the probability of a “success” (an outcome of 1) in a binary outcome variable.
The probability is related to the predictors via a sigmoid link function,
In this modeling scheme, one typically finds
To proceed with building our email classifier, we will, as usual, use our data (outcome
Go to Posit Cloud and continue the project titled ae-09-Spam.
ICYMI
Today’s daily check-in access code: slopes
(released in class)