Inference for Many Means

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center, inverse

.pull-right[

## .whitish[ANOVA Test]

## .whitish[Inference for]

## .whitish[Many Means]

<br>

### .whitish[Kelly McConville]

#### .yellow[ Stat 100 | Week 12 | Spring 2022]

]

---

### Announcements

* Project Assignment 3 is due Friday, April 22nd at 5pm

****************************

### Goals for Today

.pull-left[

* Cover the **ANOVA** test.

* Learn about the F distribution.

]

.pull-right[

* Compare **Simulation** methods versus **Probability Model** methods for inference.

* Start exploring inference for linear regression models.

]

---

### Inference for Many Means

Consider the situation where:

* Response variable: quantitative

* Explanatory variable: categorical

* Parameter of interest: `$\mu_1 - \mu_2$`

This parameter of interest only makes sense if the explanatory variable is restricted to two categories.

It is time to learn how to conduct inference for more than two means.

---

### Hypotheses

Consider the situation where:

* Response variable: quantitative

* Explanatory variable: categorical

`$H_o$`: `$\mu_1 = \mu_2 = \cdots = \mu_K$`  (Variables are independent/not related.)

`$H_a$`: At least one mean is not equal to the rest. (Variables are dependent/related)

---

### Example

Do Audience Ratings vary by movie genre?

```r
library(tidyverse)
# Load data
library(Lock5Data)
movies <- HollywoodMovies2011 %>%
  filter(!(Genre %in% c("Fantasy", "Adventure"))) %>%
  drop_na(Genre, AudienceScore)
```

* **Cases**:

* **Variables of interest (including type)**:

* **Hypotheses**:

---

### Example

.pull-left[

Does there appear to be a relationship?

```r
ggplot(data = movies, 
       mapping = aes(x = Genre,
                     y = AudienceScore)) + 
  geom_boxplot() +     
  stat_summary(fun = mean, geom = "point", 
               color = "purple",
               shape = 8, size = 3)
```

]

.pull-right[

]

What movie did the audience hate so much??

---

### Example

.pull-left[

Does there appear to be a relationship?

```r
bad <- filter(movies, 
        AudienceScore == min(AudienceScore))

ggplot(data = movies, 
       mapping = aes(x = Genre,
                     y = AudienceScore)) + 
  geom_boxplot() +     
  stat_summary(fun = mean, geom = "point", 
               color = "purple",
               shape = 8, size = 3) +
  geom_label(data = bad, 
             mapping = aes(label = Movie))
```

]

.pull-right[

]

What movie did the audience hate so much??

---

### Trespass

---

### Test Statistic

Need a test statistic!

* Won't be a sample statistic.

$$
\bar{x}_1 - \bar{x}_2 - \cdots - \bar{x}_K \mbox{ won't work!}
$$

* Needs to measure the discrepancy between the **observed** sample and the sample **we'd expect** to see if `$H_o$` were true.

* Would be nice if its null distribution could be approximated by a known probability model.

******************************

Let's return to the **name** of the test.

* Called "Analysis of **VARIANCE**" test.

* Not called "Analysis of **MEANS**" test.

**Question**: Why analyze **variability** to test differences in means?

---

###  Why analyze **variability** to test differences in means?

Let's look at some simulated data for a moment.

**Question**: For which scenario are you most convinced that the means are different?

---

### Key Idea: Partitioning the Variability

.pull-left[

]

.pull-right[

`\begin{align*}
\mbox{Total Variability} & = \\
& \mbox{Variability Between Groups} + \\
& \mbox{Variability Within Groups}
\end{align*}`

]

.pull-left[

* Variability **Between** Groups: How much the group means vary
    + Compare the red dots.

]

.pull-right[

* Variability **Within** Groups: How much natural group variability there is
    + Within groups, compare the black dots to the red dot.
 
]

---

### Key Idea: Partitioning the Variability

`\begin{align*}
\mbox{Total Variability} & = \mbox{Variability Between Groups} + \mbox{Variability Within Groups}
\end{align*}`

* Variability **Between** Groups: How much the group means vary
    + Compare the red dots.

`\begin{align*}
\mbox{Variability Between Groups} &= \sum n_i (\bar{x}_i - \bar{x})^2 \\
& = \mbox{Sum of Squares Group} \\
& = \mbox{SSG} 
\end{align*}`

* Variability **Within** Groups: How much natural group variability there is
    + Within groups, compare the black dots to the red dot.

`\begin{align*}
\mbox{Variability Within Groups} &= \sum  (x - \bar{x}_i)^2 \\
& = \mbox{Sum of Squares Error} \\
& = \mbox{SSE} 
\end{align*}`

* Total Variability: How much points vary from the overall mean

`\begin{align*}
\mbox{Total Variability} &= \sum  (x - \bar{x})^2 \\
& = \mbox{Sum of Squares Total} \\
& = \mbox{SSTotal} 
\end{align*}`

---

### Mean Squares

Need to standardize the Sums of Squares to compare SSG to SSE.

`\begin{align*}
\mbox{Mean Variability Between Groups} & = \frac{\mbox{SSG}}{K - 1} 
\end{align*}`

`\begin{align*}
\mbox{Mean Variability Within Groups} & = \frac{\mbox{SSE}}{n - K} 
\end{align*}`

* Now on a comparable scale!

* Now we can create a test statistic that compares these two measures of variability.

---

### Test Statistic

In some ways, MSG is the natural test statistic but as we saw for this example, MSG alone isn't enough.

Scenarios 2 and 3 have roughly the same MSG but we are much more convinced that the means are different for 2 than 3.

That is where MSE comes in!

---

### Test Statistic

$$
F = \frac{\mbox{MSG}}{\mbox{MSE}} = \frac{\mbox{variance between groups}}{\mbox{variance within groups}}
$$

If `$H_o$` is true, then `$F$` should be roughly equal to what?

If `$H_a$` is true, then `$F$` should be greater than 1 because there is more variation in the group means than we'd expect if the population means are all equal.

---

### Returning to the Movies Example

```r
library(infer)
#Compute F test stat
test_stat <- movies %>%
  specify(AudienceScore ~ Genre) %>%
  calculate(stat = "F") 
test_stat
```

```
## Response: AudienceScore (numeric)
## Explanatory: Genre (factor)
## # A tibble: 1 × 1
##    stat
##   <dbl>
## 1  3.88
```

* Is 3.88 a **large** test statistic?  Is a test statistic of 3.88 **unusual** under `$H_o$`?

---

### Generating the Null Distribution

.pull-left[

```
##    AudienceScore     Genre
## 1             49    Comedy
## 2             68     Drama
## 3             91     Drama
## 4             62    Comedy
## 5             53     Drama
## 6             73 Animation
## 7             42    Comedy
## 8             76 Animation
## 9             63    Comedy
## 10            54    Comedy
## 11            55    Comedy
## 12            59 Animation
## 13            77    Comedy
## 14            38    Action
## 15            59    Comedy
## 16            50   Romance
## 17            24  Thriller
## 18            61    Comedy
## 19            31    Horror
## 20            70  Thriller
```

]

.pull-right[

**Steps**:

1. Shuffle Genre.
2. Compute the `$MSE$` and `$MSG$`.  
3. Compute the test statistic.
4. Repeat 1 - 3 many times.

]

---

### Generating the Null Distribution

.pull-left[

```r
# Construct null distribution
null_dist <- movies %>%
  specify(AudienceScore ~ Genre) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "F")

visualize(null_dist)
```

]

.pull-right[

]

---

### The Null Distribution

.pull-left[

**Key Observations**:

* Smallest possible value?

<br>

* Shape?

]

.pull-right[

]

---

### The Null Distribution

.pull-left[

**Key Observations**:

* Smallest possible value?

<br>

* Shape?

<br>

* Is our observed test statistic unusual?

]

.pull-right[

]

---

### The P-value

```r
# Compute p-value
null_dist %>%
  get_pvalue(obs_stat = test_stat, direction = "greater")
```

```
## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1       0
```

---

### Approximating the Null Distribution

.pull-left[

* There are at least 30 observations **in each group** or the response variable is normal
* The variability is similar for all groups

then

$$
\mbox{test statistic} \sim F(df1 = K - 1, df2 = n - K)
$$
]

.pull-right[

]

---

### The ANOVA Test

Check assumptions!

```r
movies %>%
  group_by(Genre) %>%
  summarize(n(), sd(AudienceScore))
```

```
## # A tibble: 7 × 3
##   Genre     `n()` `sd(AudienceScore)`
##   <fct>     <int>               <dbl>
## 1 Action       32                18.4
## 2 Animation    12                13.9
## 3 Comedy       27                15.7
## 4 Drama        21                14.5
## 5 Horror       17                15.9
## 6 Romance      10                12.9
## 7 Thriller     13                14.9
```

---

### The ANOVA Test

Check assumptions!

```r
ggplot(data = movies, mapping = aes(x = AudienceScore)) + 
  geom_histogram(bins = 15) + 
  facet_wrap(~Genre)
```

---

### The ANOVA Test

```r
library(broom)
mod_anova <- aov(AudienceScore ~ Genre, data = movies)
tidy(mod_anova)
```

```
## # A tibble: 2 × 6
##   term         df  sumsq meansq statistic  p.value
##   <chr>     <dbl>  <dbl>  <dbl>     <dbl>    <dbl>
## 1 Genre         6  5855.   976.      3.88  0.00137
## 2 Residuals   125 31413.   251.     NA    NA
```

---

### Connection to Linear Regression

```r
library(moderndive)
mod_reg <- lm(AudienceScore ~ Genre, data = movies)
get_regression_table(mod_reg, print = TRUE)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std_error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p_value </th>
   <th style="text-align:right;"> lower_ci </th>
   <th style="text-align:right;"> upper_ci </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> intercept </td>
   <td style="text-align:right;"> 58.625 </td>
   <td style="text-align:right;"> 2.802 </td>
   <td style="text-align:right;"> 20.920 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> 53.079 </td>
   <td style="text-align:right;"> 64.171 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Animation </td>
   <td style="text-align:right;"> 5.458 </td>
   <td style="text-align:right;"> 5.366 </td>
   <td style="text-align:right;"> 1.017 </td>
   <td style="text-align:right;"> 0.311 </td>
   <td style="text-align:right;"> -5.162 </td>
   <td style="text-align:right;"> 16.079 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Comedy </td>
   <td style="text-align:right;"> 0.486 </td>
   <td style="text-align:right;"> 4.143 </td>
   <td style="text-align:right;"> 0.117 </td>
   <td style="text-align:right;"> 0.907 </td>
   <td style="text-align:right;"> -7.713 </td>
   <td style="text-align:right;"> 8.685 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Drama </td>
   <td style="text-align:right;"> 13.470 </td>
   <td style="text-align:right;"> 4.452 </td>
   <td style="text-align:right;"> 3.026 </td>
   <td style="text-align:right;"> 0.003 </td>
   <td style="text-align:right;"> 4.659 </td>
   <td style="text-align:right;"> 22.281 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Horror </td>
   <td style="text-align:right;"> -9.978 </td>
   <td style="text-align:right;"> 4.758 </td>
   <td style="text-align:right;"> -2.097 </td>
   <td style="text-align:right;"> 0.038 </td>
   <td style="text-align:right;"> -19.394 </td>
   <td style="text-align:right;"> -0.562 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Romance </td>
   <td style="text-align:right;"> 6.175 </td>
   <td style="text-align:right;"> 5.743 </td>
   <td style="text-align:right;"> 1.075 </td>
   <td style="text-align:right;"> 0.284 </td>
   <td style="text-align:right;"> -5.191 </td>
   <td style="text-align:right;"> 17.541 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Genre: Thriller </td>
   <td style="text-align:right;"> 5.683 </td>
   <td style="text-align:right;"> 5.214 </td>
   <td style="text-align:right;"> 1.090 </td>
   <td style="text-align:right;"> 0.278 </td>
   <td style="text-align:right;"> -4.636 </td>
   <td style="text-align:right;"> 16.002 </td>
  </tr>
</tbody>
</table>

---

### Connection to Linear Regression

```r
tidy(mod_anova)
```

```r
glance(mod_reg)
```

```
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.157         0.117  15.9      3.88 0.00137     6  -548. 1113. 1136.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

---

## Many ANOVA Tests Out There!

* We learned the **One-Way** ANOVA test.

* **Two-Way**: Have two categorical, explanatory variables.

* **Repeated Measures ANOVA**: Have multiple observations on each case.
    + All the tests we have focused (beyond paired data) assumed independent observations.
    
--

* **ANOVA Tests for Regression**: Allow comparisons of various subsets of a multiple linear regression model.

---

background-image: url("img/hyp_testing_diagram.png")
background-position: contain
background-size: 80%

### Have Learned Two Routes to Statistical Inference

Which is **better**?

---

## Is Simulation-Based Inference or Theory-Based Inference better?

Depends on how you define **better**.

.pull-left[

* If **better** = Leads to better understanding:

]

.pull-right[

&#8594; Research tends to show students have a better understanding of **p-values** and **confidence** from learning simulation-based methods.

]

.pull-left[

* If **better** = More flexible/robust to assumptions:

]

.pull-right[

&#8594; The simulation-based methods tend to be more flexible but that generally requires learning extensions beyond what we've seen in Stat 100.

]

.pull-left[

* If **better** = More commonly used:

]

.pull-right[

&#8594; Definitely the theory-based methods but the simulation-based methods are becoming more common.

]

Good to be comfortable with both!