Exam 1 Review

Suggested answers

  1. There are 409 rows in the blizzard_salary dataset. Each row represents a Blizzard Entertainment worker who filled out the spreadsheet.
  2. a - Figure 1 - A shared x-axis makes it easier to compare summary statistics for the variable on the x-axis.
  3. c - It’s a value higher than the median for hourly but lower than the mean for salaried.
  4. b - There is more variability around the mean compared to the hourly distribution.
  5. b - Pie charts and waffle charts are for categorical data only.
  6. c - For every additional $1,000 of annual salary, the model predicts the raise to be higher, on average, by 0.0155%.
  7. d - \(R^2\) of raise_2_fit is higher than \(R^2\) of raise_1_fit since raise_2_fit has one more predictor and \(R^2\) always goes up with the addition of a predictor.
  8. The reference level of performance_rating is High, since it’s the first level alphabetically. Therefore, the coefficient -2.40% is the predicted difference in raise comparing High to Successful. In this context a negative coefficient makes sense since we would expect those with High performance rating to get higher raises than those with Successful performance.
  9. a - “Poor”, “Successful”, “High”, “Top”
  10. Choose Option 2 since it shows the proportions of employees with top, high, successful, and poor performance within each salary type, and is not affected by there being much fewer hourly paid employees. Proportions of employees with top and successful performance ratings are higher for employees paid hourly than salaried.
  11. There may be some NAs in these two variables that are not visible in the plot.
  12. The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.
  13. This is a mosaic plot. It shows the marginal distribution of salary type (proportion of hourly and salaried employees), which is not displayed in the previous plot.
  14. c - Option 3. Parallel lines and salaried line has a higher intercept since Hourly is the reference level in raise_3_fit and the slope for salary_typeSalaried is positive.
  15. A parsimonious model is the simplest model with the best predictive performance.