background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle, center, inverse .pull-right[ ## .whitish[Where to Go] ## .whitish[From Here] ## .whitish[and Review] <br> ### .whitish[Kelly McConville] #### .yellow[ Stat 100 | Week 13 | Spring 2022] ] --- background-image: url("img/ggparty.003.jpeg") background-position: contain background-size: 90% --- class: middle, center # `ggparty` Rain Location: Science Center 316 --- ### Announcements * [The OH schedule](https://docs.google.com/spreadsheets/d/18ckvWMtKWJrpq6YS-FuToAMrbHPP722FF79ZvuZ01Ys/edit?usp=sharing) for the next couple of weeks. * Final Project Assignment due Friday, May 6th. * No sections this week! **************************** -- ### Goals for Today .pull-left[ * Potential future statistical endeavors! ] .pull-right[ * Review for our current statistical endeavor! ] --- background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle .pull-right[ ## Before talking about all the things we didn't have time to cover... ] -- .pull-right[ ## Let's acknowledge that we actually learned a lot! ] --- background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle .pull-right[ #### (Some of the) Course Learning Objectives ] -- .pull-right[ * Learn how to **analyze** data in `R`. ] -- .pull-right[ * Master **creating** graphs with `ggplot2`. ] -- .pull-right[ * Apply data wrangling operations with `dplyr`. ] -- .pull-right[ * Translate a research problem into a set of relevant questions that can be answered with data. ] -- .pull-right[ * Reflect on how **sample design structures** impact potential conclusions. ] -- .pull-right[ * Appropriately apply and draw inferences from a statistical model, including **quantifying and interpreting the uncertainty** in model estimates. ] -- .pull-right[ * Develop a **reproducible** workflow using `R` Markdown documents. ] --- class: inverse, middle, center ## What else should we learn? --- name: acquisition background-image: url("img/data_acquisition.jpeg") background-position: left background-size: 15% .pull-rightish[ ### Data Acquisition * How to handle data that were collected using a **non-simple random sampling design** <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-1-1.png" width="504" style="display: block; margin: auto;" /> ] --- name: acquisition background-image: url("img/data_acquisition.jpeg") background-position: left background-size: 15% .pull-rightish[ ### Data Acquisition * How to handle data that were collected using a **non-simple random sampling design** <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-2-1.png" width="504" style="display: block; margin: auto;" /> * Consider: + Stat 160: Introduction to Survey Sampling and Estimation + Gov 1010: Survey Research Methods + SOCIOL 157: Qualitative Methods in Sociology ] --- name: acquisition background-image: url("img/data_acquisition.jpeg") background-position: left background-size: 15% .pull-rightish[ ### Data Acquisition * How to draw **causation** from observational data (i.e. data collect not using **random assignment**) <img src="img/correlation_xkcd.png" width="465" style="display: block; margin: auto;" /> ] -- .pull-rightish[ * Consider: + Any class that mentions "**causal inference**" in the description. + Stat 186: Introduction to Causal Inference ] --- background-image: url("img/exploration_visualization.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Data Visualization * How to fully customize your `ggplots` <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-5-1.png" width="576" style="display: block; margin: auto;" /> ] --- background-image: url("img/exploration_visualization.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Data Visualization * How to fully customize your `ggplots` <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-6-1.png" width="576" style="display: block; margin: auto;" /> ] --- background-image: url("img/exploration_visualization.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Data Visualization * How to create graphs we haven't seen in Stat 100 .pull-left[ <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-7-1.png" width="576" style="display: block; margin: auto;" /> ] .pull-right[ <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-8-1.png" width="576" style="display: block; margin: auto;" /> ] ] --- background-image: url("img/exploration_visualization.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Data Visualization * How to graph **spatial** data
] --- background-image: url("img/exploration_visualization.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Data Visualization * How to create [fancy dashboards](https://ncasi-shiny-tools.shinyapps.io/Counties/) with the `R` package `shiny` * Consider: + Stat 1XX: "Introduction to Statistical Computing in R" or "Introduction to Computing, Wrangling, Scraping, and Visualizing in R" or ... + Data Science 1: Introduction to Data Science + A GIS course + COMPSCI 171: Visualization ] --- name: inference background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * How to conduct more sophisticated **model selection** <img src="img/fs.png" width="25%" style="float:left; padding:10px" style="display: block; margin: auto;" /> **Mission**: "Make and keep current a comprehensive inventory and analysis of the present and prospective conditions of and requirements for the renewable resources of the forest and rangelands of the US." **Goal**: Estimate number of trees per acre for a given plot of land. ] --- name: inference background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * How to conduct more sophisticated **model selection** <img src="img/FIA_EDA.jpeg" width="1736" style="display: block; margin: auto;" /> ] --- name: inference background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * How to conduct more sophisticated **model selection** * **LASSO Method**: Find `\(\hat{\beta}\)`'s based on the following criteria: `\begin{aligned} \boldsymbol{\hat{\beta}} &= \underset{\boldsymbol{\beta}}{\arg\min} \left\{ \sum_{i \in s} (y_i - \boldsymbol{x}_i^T \boldsymbol{\beta})^2 + \lambda \sum_{j=1}^p \left|\beta_j\right|\right\} \end{aligned}` **Selected Predictors**: * Normalized Difference Vegetation Index * Slope * Normalized Burn Ratio * Elevation * Slope : Forest/Non-Forest ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * How to build models **beyond** regression <img src="img/bls2.png" width="25%" style="float:left; padding:10px" style="display: block; margin: auto;" /> **Mission**: "Measures labor market activity, working conditions, price changes, and productivity in the U.S. economy to support public and private decision making." **Goal**: Estimate number of bartenders an establishment has. ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * Why not use regression in this example? * Predictors come from the Quarterly Census of Employment and Wages: * Size class * Geographic information * Industry type * Whether or not its a multi-establishment firm ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference **Regression Trees** recursively split sample into two groups based on a predictor. <img src="img/trees.001.jpeg" width="50%" style="display: block; margin: auto;" /> ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference Regression Trees **recursively** split sample into two groups based on a predictor. <img src="img/trees.002.jpeg" width="50%" style="display: block; margin: auto;" /> ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference Regression Trees recursively split sample into two groups based on a predictor. <img src="img/trees.004.jpeg" width="120%" style="display: block; margin: auto;" /> They **stop** splitting when it is no longer very predictively useful to do so. ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference Regression Trees recursively split sample into two groups based on a predictor. <img src="img/trees.003.jpeg" width="70%" style="display: block; margin: auto;" /> At each end node, the predicted value is given by the mean of the sample in that node. ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * For a deep dive into regression, consider Stat 139: Linear Models and Stat 149: Generalized Linear Models * For exposure to predictive models, try + Data Science 1: Introduction to Data Science + Data Science 2: Advanced Topics in Data Science * Good keywords to search for: + Statistical learning or machine learning + Supervised learning ] --- background-image: url("img/modeling_inference.jpeg") background-position: left background-size: 15% .pull-rightish[ ## Modeling and Inference * Why do some test statistics follow a standard normal or a t distribution? * What other random variables are out there? * Why is the Central Limit Theorem true? ] .pull-rightish[ * For a deep dive into the beautiful theory behind our data analysis, consider: + Stat 110: Introduction to Probability + Stat 111: Introduction to Statistical Inference ] --- class: inverse, middle, center ## Review Time! ### Let's first go through the Review Sheet for the Final Exam. --- class: middle, center ## One of the most challenging inferential ideas: -- ### Understanding the **many roles of the sample statistic**: -- .pull-left[ As a number ] -- .pull-right[ As a random variable ] -- .pull-left[ As a point estimate ] -- .pull-right[ As a test statistic ] --- ### Practice Problem #### Identify the different roles of a statistic in the following example: Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers. One character was described as mean, and the other was described as nice. The mean character offered two stickers, and the nice character offered one sticker. Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers. They found that 80% of the 20 children in the study selected the nice character. If the children had no preference, the probability that 80% or more would select the nice character is approximately equal to 0.0036. My best guess for the true proportion of children who would select the nice character is 0.8 (with a margin of error of 0.19 for a 95% CI). .pull-left[ * As a number ] .pull-right[ * As a random variable ] .pull-left[ * As a point estimate ] .pull-right[ * As a test statistic ] --- ### Practice Problem I generated the following distributions but don't remember if I generated a **sampling**, **bootstrap**, or **null** distribution. What are each of these? Justify your answer. .pull-left[ <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-17-1.png" width="576" style="display: block; margin: auto;" /> ] .pull-right[ <img src="stat100_wk13wed_files/figure-html/unnamed-chunk-18-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### Practice Problem Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers. One character was described as mean, and the other was described as nice. The mean character offered two stickers, and the nice character offered one sticker. Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers. They found that 80% of the 20 children in the study selected the nice character. If the children had no preference, the probability that 80% or more would select the nice character is approximately equal to 0.0036. My best guess for the true proportion of children who would select the nice character is 0.8 (with a margin of error of 0.19 for a 95% CI). #### For this example, would you recommend using theory-based methods to compute the p-value and construct the confidence interval? Why or why not? --- ### Practice Problem Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers. One character was described as mean, and the other was described as nice. The mean character offered two stickers, and the nice character offered one sticker. Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers. They found that 80% of the 20 children in the study selected the nice character. If the children had no preference, the probability that 80% or more would select the nice character is approximately equal to 0.0036. My best guess for the true proportion of children who would select the nice character is 0.8 (with a margin of error of 0.19 for a 95% CI). #### Suppose we increased the sample size to 40 and still got 80%. How would the p-value change? How about the confidence interval? --- ### Practice Problem Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers. One character was described as mean, and the other was described as nice. The mean character offered two stickers, and the nice character offered one sticker. Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers. They found that 80% of the 20 children in the study selected the nice character. If the children had no preference, the probability that 80% or more would select the nice character is approximately equal to 0.0036. My best guess for the true proportion of children who would select the nice character is 0.8 (with a margin of error of 0.19 for a 95% CI). #### What is the probability that the confidence interval contains the sample statistic? -- #### What is the better question to ask? --- background-image: url("img/ggparty.003.jpeg") background-position: contain background-size: 90% --- class: inverse, middle, center # Thanks for a wonderful semester!