library(tidyverse)
Lab 2
The goal of this lab is to effectively visualize numerical and categorical data.
For all visualizations you create, b sure to include informative titles for the plot, axes, and legend!
Getting started
Go to Posit Cloud and start the project titled lab-2 - Hello Data!.
Under the Files tab on the lower right, click on
lab-2.qmd
to open the lab template.Complete the exercises in this document.
Packages
In this lab we will work with the tidyverse packages, which is a collection of packages for doing data analysis in a “tidy” way.
Part 1: NC Courage
Today, we will be working with data from the first three full seasons of the NC Courage, a highly successful National Women’s Soccer League (NWSL) team located near Duke in Cary, NC. The Courage moved to the Triangle from Western New York in 2017 and had three epic seasons in NC, culminating in winning the championship game that was held at their stadium in Cary in 2019! Data for this lab was sourced from the nwslR
package on Github, and verified with the NC Courage website by Meredith Brown (Duke StatSci ’21) in a previous semester.
<- read_csv("data/courage.csv") courage
The courage
dataset has 78 rows and 10 variables. The variables in the dataset are as follows:
Variable | Descripton |
---|---|
game_id |
An unique ID for the game |
game_date |
Game date |
game_number |
Game number |
home_team |
Name of the home team, abbreviated |
away_team |
Name of the away team, abbreviated |
opponent |
The team NC Courage played against |
home_pts |
Number of points by the home team |
away_pts |
Number of points by the away team |
result |
Result of the game for NC Courage (win, loss, tie) |
season |
Season (2017, 2018, or 2019) |
Exercise 1
Create a bar plot of the result
s of games for NC Courage. Additionally, calculate the numbers of wins, losses, and ties. Write a one sentence narrative for your findings.
Hint: result
is a categorical variable, so use a bar plot for the visualization and the count()
function for calculating the frequencies of levels of this variable.
Exercise 2
Create a new variable indicating whether the game was played at home or away for NC Courage. This variable should be called home_courage
and take the value “home” if NC Courage is the home team and “away” if NC Courage is the away team. (Instructions for how to do this are given below.)
Then, calculate the number of home and away games, and write a one sentence narrative for your findings.
Use the example code below to get started.
<- courage |>
courage mutate(home_courage = if_else(home_team == "NC", "home", "away"))
There are two things of note here:
The use of the assignment operator (
<-
) to assign the resulting data frame tocourage
, thus overwriting thecourage
dataset to contain this new column. We do this because we will use this new variable,home_courage
, in a subsequent exercise.The use of a new function,
if_else()
to determine whether the game is played at home or away.home_team == "NC"
finds all rows where the home team is NC Courage.- If the home team is NC Courage, then we set the value of
home_courage
to `“home”. - Otherwise (else) Courage must be the away team and we set the value of
home_courage
to"away"
.
Exercise 3
First, create a visualization that displays the relationship between home_courage
and result
. Then, calculate the proportions of home and away games that the Courage won. Based on these, do your findings suggest a home-field advantage? Why or why not?
So far we have focused on whether the game was at home or away and whether the Courage won. Next, we dive deeper and focus on the number of points the Courage wins by, at home and away.
Exercise 4
How many points do the Courage typically win by (on average)? Use the example code below to get started. You’ll encounter a new function: abs()
is the absolute value function. It takes the absolute value of a number. Why do we want to use this absolute value function here?
Hint: We are only interested in games the Courage wins, therefore we should filter()
for those games first.
|>
courage filter(___) |>
mutate(win_pts = abs(home_pts - away_pts)) |>
summarize(___)
Exercise 5
How many points do NC Courage score when they win (on average)? Note this is different than how many points they “win by”. How many points do the Courage score when they lose on average?
To calculate this we first need to determine how many points NC Courage scored in every game. We can use if_else()
logic again to find this value for each game, and store it in a new column, courage_pts
.
<- courage |>
courage mutate(courage_pts = if_else(home_team == "NC", home_pts, away_pts))
|>
courage group_by(___) |>
summarize(___)
Exercise 6
Next we’ll investigate visually whether or not NC Courage has a home-field advantage. To start, let’s build on the courage2 data frame. Add a new column called home_courage that takes the value “home” if Courage is the home team and “away” if Courage is the away team. Hint: use the ifelse logic from exercise 4.
Mutate the courage
data frame to create two new variables:
total_pts
: Sum of points scored by both teams, i.e.home_pts + away_pts
.opponent_pts
: Points scored by the opposing team, i.e.,total_pts - courage_pts
.
Save the resulting data frame as courage
again and print the three points columns (total_pts
, opponent_pts
, courage_pts
) to screen.
Hint:
- Use the
mutate()
function to create the columns.
<- courage |>
courage mutate(
total_pts = ___,
opponent_pts = ___
)
- Use the
select()
function to print them to screen:
|>
courage select(total_pts, opponent_pts, courage_pts)
Exercise 7
Create a scatter plot:
opponent_pts
(y) vs.courage_pts
(x)Color the scatter plot by whether NC Courage are home or away.
Represent the data with “jittered” points wth
geom_jitter()
.Overlay a \(y = x\) line with
geom_abline()
.Faceted by season.
What does the line represent? What does it mean for a point to fall above the line? Below the line?
ggplot(courage, aes(x = ___, y = ___, color = ___)) +
geom_jitter(width = 0.1, height = 0.1) +
geom_abline(slope = 1, intercept = 0) +
facet_wrap(~ ___) +
labs(
x = "___",
y = "___",
title = "___",
color = "___"
)
Exercise 8
This exercise is a look at where we’re headed…
If we want to formally test whether the Courage have a home-field advantage, then we must first define what this means! In your own words, what do you think a home-field advantage means? Then, now that you’ve defined what it means to have a home field advantage, define what it means to not have a home-field advantage.
While there is a right answer, this part is graded for completion, so don’t worry too much about answering this in exactly the right way. Although graded for completion, your response must make sense to receive full points.
We’ll pick-up here in a future class.
Part 2: IMS Exercises
The exercises in this section do not require code. Make sure to answer the questions in full sentences.
Exercise 9
IMS - Chapter 2 exercises, #20: Vitamin supplements.
Exercise 10
IMS - Chapter 2 exercises, #30: Screens, teens, and psychological well-being.
Wrap up
Submitting
Before you proceed, first, make sure that you have updated the document YAML with your name! Then, render your document one last time, for good measure.
To submit your assignment to Gradescope:
Go to your Files pane and check the box next to the PDF output of your document (
lab-2.pdf
).Then, in the Files pane, go to More > Export. This will download the PDF file to your computer. Save it somewhere you can easily locate, e.g., your Downloads folder or your Desktop.
Go to the course Canvas page and click on Gradescope and then click on the assignment. You’ll be prompted to submit it.
Mark the pages associated with each exercise. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).
If you fail to mark the pages associated with an exercise, that exercise won’t be graded. This means, if you fail to mark the pages for all exercises, you will receive a 0 on the assignment. The TAs can’t mark your pages for you, and for them to be able to grade, you must mark them.
Grading
Exercise | Points |
---|---|
Exercise 1 | 5 |
Exercise 2 | 5 |
Exercise 3 | 6 |
Exercise 4 | 6 |
Exercise 5 | 6 |
Exercise 6 | 4 |
Exercise 7 | 6 |
Exercise 8 | 2 |
Exercise 9 | 5 |
Exercise 10 | 5 |
Total | 50 |
Acknowledgements
This assignment was adapted from a similar exercise by Dr. Alex Fisher (who will be our guest lecturer soon!).