library(tidyverse)
library(tidymodels)
Apple and Microsoft stock prices
These are suggested answers for the application exercise. They’re not necessarily complete or 100% accurate, they’re roughly what we develop in class while going through the exercises.
Today we’ll explore the question “How do stock prices of Apple and Microsoft relate to each other?”
Goals
Understand the grammar of linear modeling, including \(y\), \(x\), \(b\), \(e\) fitted estimates and residuals
Add linear regression plots to your 2D graphs
Write a simple linear regression model mathematically
Fit the model to data in R in a
tidy
way
Packages
Data
The data for this application exercise was originally gathered using the tidyquant R package. It features Apple and Microsoft stock prices from January 1st 2020 to December 31st 2021.
First, let’s load the data:
<- read_csv("data/stocks.csv") stocks
To keep things simple, we’ll work with a subset of the data, stock prices in January 2020.
<- stocks |>
stocks_jan2020 filter(month(date) == 1 & year(date) == 2020)
Models and residuals
Exercise 1
At first, you might be tempted to minimize \(\sum_i e_i\), the sum of all residuals, but this is problematic. Why? Can you come up with a better solution (other than the one listed below)?
Large positive residuals would cancel with large negative residuals.
We could minimize the sum of the absolute value of the residuals.
Minimize the sum of squared residuals
In practice, we minimize the sum of squared residuals:
\[ \sum_i e_i^2 \]
Note, this is the same as
\[ \sum_i (y_i - \hat{y}_i)^2 \]
Exercise 2
Check out an interactive visualization of “least squares regression” here. Click on I
and drag the points around to get started. Describe what you see.
Each square is a “square residual”. The line minimizes the sum of squared residuals, (i.e. it minimizes the total area of squares I see on the screen).
This is called “Ordinary Least Squares” (OLS) regression.
Plotting the least squares regression line
Plotting the OLS regression line, that is, the line that minimizes the sum of square residuals takes one new geom. Simply add
geom_smooth(method = "lm", se = FALSE)
to your plot.
method = "lm"
says to draw a line according to a “linear model” and se = FALSE
turns off standard error bars. You can try without these options for comparison.
Optionally, you can change the color of the line, e.g.
geom_smooth(method = '"lm", se = FALSE, color = "red")
Exercise 4
In the slides we fit a model with slope 0.5 and intercept -5. The code for layering the line that represents the model over your data is given below. Add geom_smooth()
as described above with color = "steelblue"
to see how close your line is.
ggplot(stocks_jan2020, aes(x = MSFT.Open, y = AAPL.Open)) +
geom_point() +
geom_abline(slope = 0.5, intercept = -5) +
geom_smooth(method = "lm", se = F, color = "steelblue") +
labs(
x = "MSFT Open",
y = "AAPL Open",
title = "Open prices of MSFT and AAPL",
subtitle = "January 2020"
)
`geom_smooth()` using formula = 'y ~ x'
Finding \(\hat{\beta}\)
To fit the model in R, i.e. to “find \(\hat{\beta}\)”, use the code below as a template:
model_fit <- linear_reg() |>
fit(y ~ x, data = dataframe)
linear_reg
tellsR
we will perform linear regressionfit
defines the outcome \(y\), predictor \(x\) and the data set
Running the code above, but replacing the arguments of the fit
command appropriately will create a new object called model_fit
that stores all information about your fitted model.
To access the information, you can run, e.g.
tidy(model_fit)
Let’s try it out.
Exercise 5
Find the least squares line \(y = \hat{\beta_0} + \hat{\beta_1} x\) for January 2020, where \(x\) is Microsoft’s opening price and \(y\) is Apple’s opening price. Display a tidy summary of your model.
<- linear_reg() |>
stock_fit fit(AAPL.Open ~ MSFT.Open, data = stocks_jan2020)
tidy(stock_fit)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.31 8.87 0.373 0.713
2 MSFT.Open 0.454 0.0541 8.40 0.0000000808
Exercise 6
Re-write the fitted equation replacing \(\hat{\beta}_0\) and \(\hat{\beta}_1\) with the estimates from the model you fit in the previous exercise.
My fitted model is:
\[ \hat{y} = 3.3117408 + 0.4542263 x \]
where \(\hat{y}\) is the predicted Apple open price and \(x\) is the open price of Microsoft.
Exercise 7
Interpret \(\hat{\beta}_0\) and \(\hat{\beta}_1\) in context of the data.
\(\hat{\beta_0} = 3.3117408\)
It’s the price of APPL if MSFT opened at 0.
\(\hat{\beta_1} = 0.4542263\).
For every dollar increase in MSFT open price, we expect AAPL open price to increase by .45.
Bonus exercise
Say Microsoft opens at 166 dollars. What do I predict the opening price of AAPL to be?
<- 3.3117408 + (0.4542263 * 166)
yhat yhat
[1] 78.71331