background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle, center, .pull-right[ ## .base-blue[Inference] ## .base-blue[for] ## .base-blue[Categorical Variables] <br> ### .purple[Kelly McConville] #### .purple[ Stat 100 | Week 14 | Fall 2022] ] --- class: middle, center ### One more reminder to RSVP to the `ggparty`: ### [bit.ly/stat100-ggparty](https://bit.ly/stat100-ggparty) --- ### Announcements * Extra Credit Lecture Quiz due Dec 1st. * P-Set 9 (the LAST p-set) is due Th, Dec 1st! * Project Assignment 3 is due Tu, Dec 6th. + Also provide [feedback on work distribution](https://forms.gle/EJ1zgcC3c5rNrn9f7). * Let's discuss the review sheet. **************************** -- ### Goals for Today .pull-left[ * Learn about Chi-Squared Random Variables. * Consider inference for categorical variables with **more than** 2 categories! ] .pull-right[ * Quick wrap-up! ] --- ### Inference for Categorical Variables Consider the situation where: * Response variable: categorical * Explanatory variable: categorical -- * Parameter of interest: `\(p_1 - p_2\)` -- This parameter of interest only makes sense if **both** variables only have two categories. -- It is time to learn how to study the relationship between two categorical variables when **at least one has more than two categories.** --- ### Hypotheses Consider the situation where: * Response variable: categorical * Explanatory variable: categorical -- `\(H_o\)`: The two variables are independent. `\(H_a\)`: The two variables are dependent. --- ### Example Near-sightedness typically develops during the childhood years. Quinn, Shin, Maguire, and Stone (1999) explored whether there is a relationship between the type of light children were exposed to and their eye health based on questionnaires filled out by the children's parents at a university pediatric ophthalmology clinic. ```r library(tidyverse) library(infer) # Import data eye_data <- read_csv("~/shared_data/stat100/data/eye_lighting.csv") # Contingency table eye_data %>% count(Lighting, Eye) ``` ``` ## # A tibble: 9 × 3 ## Lighting Eye n ## <chr> <chr> <int> ## 1 dark Far 40 ## 2 dark Near 18 ## 3 dark Normal 114 ## 4 night Far 39 ## 5 night Near 78 ## 6 night Normal 115 ## 7 room Far 12 ## 8 room Near 41 ## 9 room Normal 22 ``` --- ### Eyesight Example Does there appear to be a relationship/dependence? .pull-left[ ```r ggplot(data = eye_data, mapping = aes(x = Lighting, fill = Eye)) + geom_bar(position = "fill") ``` ] .pull-right[ <img src="stat100_wk14wed_files/figure-html/eyeplot-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### Eyesight Example Need a test statistic! -- * Won't be a single sample statistic. -- * Needs to measure the discrepancy between the observed sample and the sample we'd expect to see if `\(H_o\)` (no relationship) were true. -- * Would be nice if its null distribution could be approximated by a known probability model. --- #### Table of Observed Results ```r table(eye_data$Eye, eye_data$Lighting) %>% addmargins() %>% kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> -- **Question**: If `\(H_o\)` were correct, is this the table that we'd expect to see? <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Far </th> <th style="text-align:right;"> Near </th> <th style="text-align:right;"> Normal </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 159 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 159 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 159 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 159 </td> <td style="text-align:right;"> 159 </td> <td style="text-align:right;"> 159 </td> <td style="text-align:right;"> 477 </td> </tr> </tbody> </table> --- #### Table of Observed Results <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> **Question**: If `\(H_o\)` were correct, what table would we expect to see? Want a `\(H_o\)` table that respects the eye condition proportions: `$$\hat{p}_{far} = 91/479$$` `$$\hat{p}_{nor} = 251/479$$` `$$\hat{p}_{nea} = 137/479$$` --- #### Table of Observed Results <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> **Question**: If `\(H_o\)` were correct, what table would we expect to see? <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> (91/479)172 </td> <td style="text-align:right;"> (91/479)232 </td> <td style="text-align:right;"> (91/479)75 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> (137/479)172 </td> <td style="text-align:right;"> (137/479)232 </td> <td style="text-align:right;"> (137/479)75 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> (251/479)172 </td> <td style="text-align:right;"> (251/479)232 </td> <td style="text-align:right;"> (251/479)75 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> -- * Still have the same totals but distributed the values differently within the table --- #### Table of Observed Results <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> **Question**: If `\(H_o\)` were correct, what table would we expect to see? <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 32.67641 </td> <td style="text-align:right;"> 44.07516 </td> <td style="text-align:right;"> 14.24843 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 49.19415 </td> <td style="text-align:right;"> 66.35491 </td> <td style="text-align:right;"> 21.45094 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 90.12944 </td> <td style="text-align:right;"> 121.56994 </td> <td style="text-align:right;"> 39.30063 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172.00000 </td> <td style="text-align:right;"> 232.00000 </td> <td style="text-align:right;"> 75.00000 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> --- ### Expected Table .pull-left[ * How does this table represent `\(H_o\)`? <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 32.67641 </td> <td style="text-align:right;"> 44.07516 </td> <td style="text-align:right;"> 14.24843 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 49.19415 </td> <td style="text-align:right;"> 66.35491 </td> <td style="text-align:right;"> 21.45094 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 90.12944 </td> <td style="text-align:right;"> 121.56994 </td> <td style="text-align:right;"> 39.30063 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172.00000 </td> <td style="text-align:right;"> 232.00000 </td> <td style="text-align:right;"> 75.00000 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> ] -- .pull-right[ <img src="stat100_wk14wed_files/figure-html/unnamed-chunk-10-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### Test Statistic Want the test statistic to quantify the difference between the observed table and the expected table. <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 32.68 </td> <td style="text-align:right;"> 44.08 </td> <td style="text-align:right;"> 14.25 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 49.19 </td> <td style="text-align:right;"> 66.35 </td> <td style="text-align:right;"> 21.45 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 90.13 </td> <td style="text-align:right;"> 121.57 </td> <td style="text-align:right;"> 39.30 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172.00 </td> <td style="text-align:right;"> 232.00 </td> <td style="text-align:right;"> 75.00 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> -- For each cell: Compute a **Z-score**! -- `\begin{align*} \mbox{Z-score} &= \frac{\mbox{stat - mean}}{\mbox{SE}} \\ & = \frac{\mbox{observed - expected}}{\sqrt{\mbox{expected}}} \end{align*}` --- ### Test Statistic Want the test statistic to quantify the difference between the observed table and the expected table. <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 32.68 </td> <td style="text-align:right;"> 44.08 </td> <td style="text-align:right;"> 14.25 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 49.19 </td> <td style="text-align:right;"> 66.35 </td> <td style="text-align:right;"> 21.45 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 90.13 </td> <td style="text-align:right;"> 121.57 </td> <td style="text-align:right;"> 39.30 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172.00 </td> <td style="text-align:right;"> 232.00 </td> <td style="text-align:right;"> 75.00 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> **Test Statistic:** `\begin{align*} \chi^2 = \sum \left(\frac{\mbox{observed - expected}}{\sqrt{\mbox{expected}}} \right)^2 \end{align*}` -- → Large test statistics signify that results are unusual if `\(H_o\)` is true. --- <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172 </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> <table style="display: inline-block;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> dark </th> <th style="text-align:right;"> night </th> <th style="text-align:right;"> room </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Far </td> <td style="text-align:right;"> 32.68 </td> <td style="text-align:right;"> 44.08 </td> <td style="text-align:right;"> 14.25 </td> <td style="text-align:right;"> 91 </td> </tr> <tr> <td style="text-align:left;"> Near </td> <td style="text-align:right;"> 49.19 </td> <td style="text-align:right;"> 66.35 </td> <td style="text-align:right;"> 21.45 </td> <td style="text-align:right;"> 137 </td> </tr> <tr> <td style="text-align:left;"> Normal </td> <td style="text-align:right;"> 90.13 </td> <td style="text-align:right;"> 121.57 </td> <td style="text-align:right;"> 39.30 </td> <td style="text-align:right;"> 251 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;"> 172.00 </td> <td style="text-align:right;"> 232.00 </td> <td style="text-align:right;"> 75.00 </td> <td style="text-align:right;"> 479 </td> </tr> </tbody> </table> ```r library(infer) #Compute Chi-square test stat test_stat <- eye_data %>% specify(Eye ~ Lighting) %>% calculate(stat = "Chisq") test_stat ``` ``` ## Response: Eye (factor) ## Explanatory: Lighting (factor) ## # A tibble: 1 × 1 ## stat ## <dbl> ## 1 56.5 ``` -- Is 56.5 **large**? Is 56.5 **unusual** under `\(H_o\)`? --- ### Generating the Null Distribution #### Steps: 1. Shuffle lighting. 2. Compute the new observed table. 3. Compute the test statistic. 4. Repeat 1 - 3 many times. ``` ## # A tibble: 10 × 2 ## Eye Lighting ## <chr> <chr> ## 1 Far night ## 2 Far dark ## 3 Near room ## 4 Far dark ## 5 Normal room ## 6 Near room ## 7 Normal night ## 8 Normal dark ## 9 Far night ## 10 Near room ``` --- ### Generating the Null Distribution .pull-left[ ```r # Construct null distribution null_dist <- eye_data %>% specify(Eye ~ Lighting) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "Chisq") visualize(null_dist) ``` ] .pull-right[ <img src="stat100_wk14wed_files/figure-html/null-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### The Null Distribution .pull-left[ **Key Observations about the distribution**: * Smallest possible value? * Shape? ] .pull-right[ <img src="stat100_wk14wed_files/figure-html/unnamed-chunk-14-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The Null Distribution .pull-left[ **Key Observations about the distribution**: * Smallest possible value? * Shape? * Is our observed test statistic of 56.5 unusual? ] .pull-right[ <img src="stat100_wk14wed_files/figure-html/unnamed-chunk-15-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The P-value ```r # Compute p-value null_dist %>% get_pvalue(obs_stat = test_stat, direction = "greater") ``` ``` ## # A tibble: 1 × 1 ## p_value ## <dbl> ## 1 0 ``` --- ### Approximating the Null Distribution .pull-left[ If there are at least 5 observations in each cell, then $$ \mbox{test statistic} \sim \chi^2(df = (\mbox{# of rows} - 1)(\mbox{# of columns} - 1)) $$ The `\(df\)` controls the center and spread of the distribution. ] .pull-right[ <img src="stat100_wk14wed_files/figure-html/unnamed-chunk-17-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### The Chi-Squared Test ```r chisq.test(table(eye_data$Eye, eye_data$Lighting)) ``` ``` ## ## Pearson's Chi-squared test ## ## data: table(eye_data$Eye, eye_data$Lighting) ## X-squared = 56.513, df = 4, p-value = 1.565e-11 ``` -- Conclusions? -- * Causation? -- * Decisions, decisions... --- background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle .pull-right[ ## Before talking about what else there is to study... ] -- .pull-right[ ## Let's acknowledge that we actually learned a lot this semester! ] --- background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle .pull-right[ #### (Some of the) Course Learning Objectives ] -- .pull-right[ * Learn how to **analyze** data in `R`. ] -- .pull-right[ * Master **creating** graphs with `ggplot2`. ] -- .pull-right[ * Apply data wrangling operations with `dplyr`. ] -- .pull-right[ * Translate a research problem into a set of relevant questions that can be answered with data. ] -- .pull-right[ * Reflect on how **sample design structures** impact potential conclusions. ] -- .pull-right[ * Appropriately apply and draw inferences from a statistical model, including **quantifying and interpreting the uncertainty** in model estimates. ] -- .pull-right[ * Develop a **reproducible** workflow using `R` Markdown documents. ] --- class: middle, center ## What if I want to learn more stats and data science? -- #### For more modeling: Stat 139: Linear Models #### For more theory and inference: Stat 110: Introduction to Probability Theory and Stat 111: Introduction to Statistical Inference #### For more coding: Stat 108: Introduction to Statistical Computing in R --- class: middle, center # Thanks for a wonderful semester!