background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle, center, inverse .pull-right[ ## .whitish[ANOVA Test] ## .whitish[Inference for] ## .whitish[Many Means] <br> ### .whitish[Kelly McConville] #### .yellow[ Stat 100 | Week 12 | Spring 2022] ] --- ### Announcements * Project Assignment 3 is due Friday, April 22nd at 5pm **************************** -- ### Goals for Today .pull-left[ * Cover the **ANOVA** test. * Learn about the F distribution. ] .pull-right[ * Compare **Simulation** methods versus **Probability Model** methods for inference. * Start exploring inference for linear regression models. ] --- ### Inference for Many Means Consider the situation where: * Response variable: quantitative * Explanatory variable: categorical -- * Parameter of interest: `\(\mu_1 - \mu_2\)` -- This parameter of interest only makes sense if the explanatory variable is restricted to two categories. -- It is time to learn how to conduct inference for more than two means. --- ### Hypotheses Consider the situation where: * Response variable: quantitative * Explanatory variable: categorical -- `\(H_o\)`: `\(\mu_1 = \mu_2 = \cdots = \mu_K\)` (Variables are independent/not related.) `\(H_a\)`: At least one mean is not equal to the rest. (Variables are dependent/related) --- ### Example Do Audience Ratings vary by movie genre? ```r library(tidyverse) # Load data library(Lock5Data) movies <- HollywoodMovies2011 %>% filter(!(Genre %in% c("Fantasy", "Adventure"))) %>% drop_na(Genre, AudienceScore) ``` * **Cases**: * **Variables of interest (including type)**: * **Hypotheses**: --- ### Example .pull-left[ Does there appear to be a relationship? ```r ggplot(data = movies, mapping = aes(x = Genre, y = AudienceScore)) + geom_boxplot() + stat_summary(fun = mean, geom = "point", color = "purple", shape = 8, size = 3) ``` ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/movies-1.png" width="768" style="display: block; margin: auto;" /> ] -- What movie did the audience hate so much?? --- ### Example .pull-left[ Does there appear to be a relationship? ```r bad <- filter(movies, AudienceScore == min(AudienceScore)) ggplot(data = movies, mapping = aes(x = Genre, y = AudienceScore)) + geom_boxplot() + stat_summary(fun = mean, geom = "point", color = "purple", shape = 8, size = 3) + geom_label(data = bad, mapping = aes(label = Movie)) ``` ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/bad-1.png" width="768" style="display: block; margin: auto;" /> ] What movie did the audience hate so much?? --- ### Trespass <img src="img/trespass.001.jpeg" width="1275" style="display: block; margin: auto;" /> --- ### Test Statistic Need a test statistic! -- * Won't be a sample statistic. $$ \bar{x}_1 - \bar{x}_2 - \cdots - \bar{x}_K \mbox{ won't work!} $$ -- * Needs to measure the discrepancy between the **observed** sample and the sample **we'd expect** to see if `\(H_o\)` were true. -- * Would be nice if its null distribution could be approximated by a known probability model. ****************************** Let's return to the **name** of the test. -- * Called "Analysis of **VARIANCE**" test. -- * Not called "Analysis of **MEANS**" test. -- **Question**: Why analyze **variability** to test differences in means? --- ### Why analyze **variability** to test differences in means? Let's look at some simulated data for a moment. <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-5-1.png" width="864" style="display: block; margin: auto;" /> **Question**: For which scenario are you most convinced that the means are different? --- ### Key Idea: Partitioning the Variability .pull-left[ <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-6-1.png" width="864" style="display: block; margin: auto;" /> ] .pull-right[ `\begin{align*} \mbox{Total Variability} & = \\ & \mbox{Variability Between Groups} + \\ & \mbox{Variability Within Groups} \end{align*}` ] -- .pull-left[ * Variability **Between** Groups: How much the group means vary + Compare the red dots. ] -- .pull-right[ * Variability **Within** Groups: How much natural group variability there is + Within groups, compare the black dots to the red dot. ] --- ### Key Idea: Partitioning the Variability `\begin{align*} \mbox{Total Variability} & = \mbox{Variability Between Groups} + \mbox{Variability Within Groups} \end{align*}` * Variability **Between** Groups: How much the group means vary + Compare the red dots. `\begin{align*} \mbox{Variability Between Groups} &= \sum n_i (\bar{x}_i - \bar{x})^2 \\ & = \mbox{Sum of Squares Group} \\ & = \mbox{SSG} \end{align*}` -- * Variability **Within** Groups: How much natural group variability there is + Within groups, compare the black dots to the red dot. `\begin{align*} \mbox{Variability Within Groups} &= \sum (x - \bar{x}_i)^2 \\ & = \mbox{Sum of Squares Error} \\ & = \mbox{SSE} \end{align*}` -- * Total Variability: How much points vary from the overall mean `\begin{align*} \mbox{Total Variability} &= \sum (x - \bar{x})^2 \\ & = \mbox{Sum of Squares Total} \\ & = \mbox{SSTotal} \end{align*}` --- ### Mean Squares Need to standardize the Sums of Squares to compare SSG to SSE. -- `\begin{align*} \mbox{Mean Variability Between Groups} & = \frac{\mbox{SSG}}{K - 1} \end{align*}` `\begin{align*} \mbox{Mean Variability Within Groups} & = \frac{\mbox{SSE}}{n - K} \end{align*}` -- * Now on a comparable scale! * Now we can create a test statistic that compares these two measures of variability. --- ### Test Statistic In some ways, MSG is the natural test statistic but as we saw for this example, MSG alone isn't enough. <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-7-1.png" width="864" style="display: block; margin: auto;" /> -- Scenarios 2 and 3 have roughly the same MSG but we are much more convinced that the means are different for 2 than 3. -- That is where MSE comes in! --- ### Test Statistic $$ F = \frac{\mbox{MSG}}{\mbox{MSE}} = \frac{\mbox{variance between groups}}{\mbox{variance within groups}} $$ If `\(H_o\)` is true, then `\(F\)` should be roughly equal to what? -- If `\(H_a\)` is true, then `\(F\)` should be greater than 1 because there is more variation in the group means than we'd expect if the population means are all equal. --- ### Returning to the Movies Example ```r library(infer) #Compute F test stat test_stat <- movies %>% specify(AudienceScore ~ Genre) %>% calculate(stat = "F") test_stat ``` ``` ## Response: AudienceScore (numeric) ## Explanatory: Genre (factor) ## # A tibble: 1 × 1 ## stat ## <dbl> ## 1 3.88 ``` -- * Is 3.88 a **large** test statistic? Is a test statistic of 3.88 **unusual** under `\(H_o\)`? --- ### Generating the Null Distribution .pull-left[ ``` ## AudienceScore Genre ## 1 49 Comedy ## 2 68 Drama ## 3 91 Drama ## 4 62 Comedy ## 5 53 Drama ## 6 73 Animation ## 7 42 Comedy ## 8 76 Animation ## 9 63 Comedy ## 10 54 Comedy ## 11 55 Comedy ## 12 59 Animation ## 13 77 Comedy ## 14 38 Action ## 15 59 Comedy ## 16 50 Romance ## 17 24 Thriller ## 18 61 Comedy ## 19 31 Horror ## 20 70 Thriller ``` ] -- .pull-right[ **Steps**: 1. Shuffle Genre. 2. Compute the `\(MSE\)` and `\(MSG\)`. 3. Compute the test statistic. 4. Repeat 1 - 3 many times. ] --- ### Generating the Null Distribution .pull-left[ ```r # Construct null distribution null_dist <- movies %>% specify(AudienceScore ~ Genre) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "F") visualize(null_dist) ``` ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/null-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### The Null Distribution .pull-left[ **Key Observations**: * Smallest possible value? <br> * Shape? ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-11-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The Null Distribution .pull-left[ **Key Observations**: * Smallest possible value? <br> * Shape? <br> * Is our observed test statistic unusual? ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-12-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The P-value ```r # Compute p-value null_dist %>% get_pvalue(obs_stat = test_stat, direction = "greater") ``` ``` ## # A tibble: 1 × 1 ## p_value ## <dbl> ## 1 0 ``` --- ### Approximating the Null Distribution .pull-left[ If * There are at least 30 observations **in each group** or the response variable is normal * The variability is similar for all groups then $$ \mbox{test statistic} \sim F(df1 = K - 1, df2 = n - K) $$ ] .pull-right[ <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-14-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The ANOVA Test Check assumptions! ```r movies %>% group_by(Genre) %>% summarize(n(), sd(AudienceScore)) ``` ``` ## # A tibble: 7 × 3 ## Genre `n()` `sd(AudienceScore)` ## <fct> <int> <dbl> ## 1 Action 32 18.4 ## 2 Animation 12 13.9 ## 3 Comedy 27 15.7 ## 4 Drama 21 14.5 ## 5 Horror 17 15.9 ## 6 Romance 10 12.9 ## 7 Thriller 13 14.9 ``` --- ### The ANOVA Test Check assumptions! ```r ggplot(data = movies, mapping = aes(x = AudienceScore)) + geom_histogram(bins = 15) + facet_wrap(~Genre) ``` <img src="stat100_wk12mon_files/figure-html/unnamed-chunk-16-1.png" width="864" style="display: block; margin: auto;" /> --- ### The ANOVA Test ```r library(broom) mod_anova <- aov(AudienceScore ~ Genre, data = movies) tidy(mod_anova) ``` ``` ## # A tibble: 2 × 6 ## term df sumsq meansq statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Genre 6 5855. 976. 3.88 0.00137 ## 2 Residuals 125 31413. 251. NA NA ``` --- ### Connection to Linear Regression ```r library(moderndive) mod_reg <- lm(AudienceScore ~ Genre, data = movies) get_regression_table(mod_reg, print = TRUE) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std_error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p_value </th> <th style="text-align:right;"> lower_ci </th> <th style="text-align:right;"> upper_ci </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> intercept </td> <td style="text-align:right;"> 58.625 </td> <td style="text-align:right;"> 2.802 </td> <td style="text-align:right;"> 20.920 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> 53.079 </td> <td style="text-align:right;"> 64.171 </td> </tr> <tr> <td style="text-align:left;"> Genre: Animation </td> <td style="text-align:right;"> 5.458 </td> <td style="text-align:right;"> 5.366 </td> <td style="text-align:right;"> 1.017 </td> <td style="text-align:right;"> 0.311 </td> <td style="text-align:right;"> -5.162 </td> <td style="text-align:right;"> 16.079 </td> </tr> <tr> <td style="text-align:left;"> Genre: Comedy </td> <td style="text-align:right;"> 0.486 </td> <td style="text-align:right;"> 4.143 </td> <td style="text-align:right;"> 0.117 </td> <td style="text-align:right;"> 0.907 </td> <td style="text-align:right;"> -7.713 </td> <td style="text-align:right;"> 8.685 </td> </tr> <tr> <td style="text-align:left;"> Genre: Drama </td> <td style="text-align:right;"> 13.470 </td> <td style="text-align:right;"> 4.452 </td> <td style="text-align:right;"> 3.026 </td> <td style="text-align:right;"> 0.003 </td> <td style="text-align:right;"> 4.659 </td> <td style="text-align:right;"> 22.281 </td> </tr> <tr> <td style="text-align:left;"> Genre: Horror </td> <td style="text-align:right;"> -9.978 </td> <td style="text-align:right;"> 4.758 </td> <td style="text-align:right;"> -2.097 </td> <td style="text-align:right;"> 0.038 </td> <td style="text-align:right;"> -19.394 </td> <td style="text-align:right;"> -0.562 </td> </tr> <tr> <td style="text-align:left;"> Genre: Romance </td> <td style="text-align:right;"> 6.175 </td> <td style="text-align:right;"> 5.743 </td> <td style="text-align:right;"> 1.075 </td> <td style="text-align:right;"> 0.284 </td> <td style="text-align:right;"> -5.191 </td> <td style="text-align:right;"> 17.541 </td> </tr> <tr> <td style="text-align:left;"> Genre: Thriller </td> <td style="text-align:right;"> 5.683 </td> <td style="text-align:right;"> 5.214 </td> <td style="text-align:right;"> 1.090 </td> <td style="text-align:right;"> 0.278 </td> <td style="text-align:right;"> -4.636 </td> <td style="text-align:right;"> 16.002 </td> </tr> </tbody> </table> --- ### Connection to Linear Regression ```r tidy(mod_anova) ``` ``` ## # A tibble: 2 × 6 ## term df sumsq meansq statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Genre 6 5855. 976. 3.88 0.00137 ## 2 Residuals 125 31413. 251. NA NA ``` ```r glance(mod_reg) ``` ``` ## # A tibble: 1 × 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.157 0.117 15.9 3.88 0.00137 6 -548. 1113. 1136. ## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` --- ## Many ANOVA Tests Out There! * We learned the **One-Way** ANOVA test. -- * **Two-Way**: Have two categorical, explanatory variables. -- * **Repeated Measures ANOVA**: Have multiple observations on each case. + All the tests we have focused (beyond paired data) assumed independent observations. -- * **ANOVA Tests for Regression**: Allow comparisons of various subsets of a multiple linear regression model. --- background-image: url("img/hyp_testing_diagram.png") background-position: contain background-size: 80% ### Have Learned Two Routes to Statistical Inference -- Which is **better**? --- ## Is Simulation-Based Inference or Theory-Based Inference better? -- Depends on how you define **better**. .pull-left[ * If **better** = Leads to better understanding: ] -- .pull-right[ → Research tends to show students have a better understanding of **p-values** and **confidence** from learning simulation-based methods. ] -- .pull-left[ * If **better** = More flexible/robust to assumptions: ] -- .pull-right[ → The simulation-based methods tend to be more flexible but that generally requires learning extensions beyond what we've seen in Stat 100. ] -- .pull-left[ * If **better** = More commonly used: ] -- .pull-right[ → Definitely the theory-based methods but the simulation-based methods are becoming more common. ] Good to be comfortable with both!