Telling a data story

Lecture 24

Dr. Mine Çetinkaya-Rundel

Duke University
STA 101 - Fall 2023

Warm up

Announcements

  • Check grades on Canvas, if anything seems off let me know asap. Your final grade will be calculated based on what’s on Canvas. No grades will be changed after the project is due.
  • My office hours next week by appointment.
  • Don’t forget to fill out course and TA evaluations.
  • Project 2:
    • Friday - Peer review. You must be in your lab section to participate in the peer review (and be eligible for the points for it).
    • Next Thursday - Video (upload to YouTube/Warpwire/etc. and share link on Google sheet) + writeup (on Gradescope) due.

Setup

library(tidyverse)
library(tidymodels)
library(ggrepel)
library(ggthemes)
library(palmerpenguins)

Telling a story

Multiple ways of telling a story

  • Sequential reveal: Motivation, then resolution

  • Instant reveal: Resolution, and hidden in it motivation

Simplicity vs. complexity

When you’re trying to show too much data at once you may end up not showing anything.

  • Never assume your audience can rapidly process complex visual displays

  • Don’t add variables to your plot that are tangential to your story

  • Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)

  • Aim for memorable, but clear

Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots or outputs are getting too wordy to explain, take time to simplify them!

Consistency vs. repetitiveness

Be consistent but don’t be repetitive.

  • Use consistent features throughout plots (e.g., same color represents same level on all plots)

  • Aim to use a different type of summary or visualization for each distinct analysis

Project note: If possible, ask a friend who is not in the class to listen to your presentation and then ask them what they remember. Then, ask yourself: is that what you wanted them to remember?

Your project plans

How are you telling your story?

  1. Sequential reveal

  2. Instant reveal

  3. Our approach doesn’t fit either of these paradigms

  4. No idea

Submit your answer on Canvas for 12-06 Check-in (access code: ___)

Designing effective visualizations

Data

d <- tribble(
  ~category,                     ~value,
  "Cutting tools"                , 0.03,
  "Buildings and administration" , 0.22,
  "Labor"                        , 0.31,
  "Machinery"                    , 0.27,
  "Workplace materials"          , 0.17
)
d
# A tibble: 5 × 2
  category                     value
  <chr>                        <dbl>
1 Cutting tools                 0.03
2 Buildings and administration  0.22
3 Labor                         0.31
4 Machinery                     0.27
5 Workplace materials           0.17

Keep it simple

Judging relative area

Use color to draw attention



Play with themes for a non-standard look

Go beyond ggplot2 themes – ggthemes

Tell a story

Leave out non-story details

Order matters

Clearly indicate missing data

Reduce cognitive load

Use descriptive titles

Annotate figures

Plot sizing and layout

Sample plots

p_hist <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2)

p_text <- mtcars |>
  rownames_to_column() |>
  ggplot(aes(x = disp, y = mpg)) +
  geom_text_repel(aes(label = rowname)) +
  coord_cartesian(clip = "off")

Small fig-width

For a zoomed-in look

```{r}
#| fig-width: 3
#| fig-asp: 0.618

p_hist
```

Large fig-width

For a zoomed-out look

```{r}
#| fig-width: 6
#| fig-asp: 0.618

p_hist
```

fig-width affects text size

Multiple plots on a slide

First, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?

  • If no, then don’t! Move the second plot to to the next slide!

  • If yes, use columns and sequential reveal.

Quarto

Writing your project report with Quarto

  • Figure sizing: fig-width, fig-height, etc. in code chunks.

  • Figure layout: layout-ncol for placing multiple figures in a chunk.

  • Further control over figure layout with the patchwork package.

  • Chunk options around what makes it in your final report: message, echo, etc.

  • Cross referencing figures and tables.

  • Adding footnotes and citations.

Cross referencing figures

As seen in Figure 1, there is a positive and relatively strong relationship between body mass and flipper length of penguins.

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()
Figure 1: The relationship between body mass and flipper length of penguins.
As seen in @fig-penguins, there is a positive and relatively strong relationship between body mass and flipper length of penguins.

```{r}
#| label: fig-penguins

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()
```

Cross referencing tables

The regression output is shown in Table 1.

penguins_fit <- linear_reg() |>
  fit(body_mass_g ~ flipper_length_mm, data = penguins)

tidy(penguins_fit) |>
  knitr::kable(digits = 3)
Table 1: The regression output for predicting body mass from flipper length of penguins.
term estimate std.error statistic p.value
(Intercept) -5780.831 305.815 -18.903 0
flipper_length_mm 49.686 1.518 32.722 0
The regression output is shown in @tbl-penguins-lm.

```{r}
#| label: tbl-penguins-lm
#| tbl-cap: The regression output for predicting body mass from flipper length of penguins.

penguins_fit <- linear_reg() |>
  fit(body_mass_g ~ flipper_length_mm, data = penguins)

tidy(penguins_fit) |>
  knitr::kable(digits = 3)
```