background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle, center, .pull-right[ ## .base-blue[Sampling Distributions] <br> <br> ### .purple[Kelly McConville] #### .purple[ Stat 100 | Week 8 | Fall 2022] ] --- ### Announcements * Mid-term exam + Instructions page + No OHs starting Wednesday at noon. + Takehome + Oral + Many assessment forms in this course so you have many ways to showcase your understanding. + Reflect **************************** -- ### Goals for Today .pull-left[ * Discuss the ❤️ of statistical inference ] .pull-right[ * **Sampling Distribution** + Creation + Properties ] --- class: center, middle ### Which Type Are You? .pull-left[ ### Data Visualizer <iframe src="https://giphy.com/embed/d31vTpVi1LAcDvdm" width="480" height="362" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/netflix-d31vTpVi1LAcDvdm">via GIPHY</a></p> ] .pull-right[ ### Data Wrangler <iframe src="https://giphy.com/embed/DbaUtl1DcLyrdwhzGJ" width="480" height="362" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/Amalgia-DbaUtl1DcLyrdwhzGJ">via GIPHY</a></p> ] --- class: center, middle ### Which Type Are You? .pull-left[ ### Model Builder <iframe src="https://giphy.com/embed/xZsLh7B3KMMyUptD9D" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/tlceurope-xZsLh7B3KMMyUptD9D">via GIPHY</a></p> ] -- .pull-right[ ### A Mix! <iframe src="https://giphy.com/embed/cmzp1t3EJ87XNOaHRJ" width="260" height="350" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/giphcrawler2018-cmzp1t3EJ87XNOaHRJ">via GIPHY</a></p> ] --- class: middle, center <img src="img/DAW.png" width="750px"/> --- ### The ❤️ of statistical inference is quantifying uncertainty <img src="img/week4.005.jpeg" width="70%" style="display: block; margin: auto;" /> -- ```r library(tidyverse) ce <- read_csv("~/shared_data/stat100/data/ce.csv") summarize(ce, meanFINCBTAX = mean(FINCBTAX)) ``` ``` ## # A tibble: 1 × 1 ## meanFINCBTAX ## <dbl> ## 1 62480. ``` --- ### The ❤️ of statistical inference is quantifying uncertainty ```r library(tidyverse) ce <- read_csv("~/shared_data/stat100/data/ce.csv") summarize(ce, meanFINCBTAX = mean(FINCBTAX)) ``` ``` ## # A tibble: 1 × 1 ## meanFINCBTAX ## <dbl> ## 1 62480. ``` #### Distinguishing between the population and the sample -- .pull-left[ * **Parameters**: + Based on the **population** + Unknown then if don't have data on the whole population + EX: `\(\beta_o\)` and `\(\beta_1\)` + EX: `\(\mu\)` = population mean ] .pull-right[ * **Statistics**: + Based on the **sample** data + Known + Usually estimate a population parameter + EX: `\(\hat{\beta}_o\)` and `\(\hat{\beta}_1\)` + EX: `\(\bar{x}\)` = sample mean ] --- ### Quantifying Our Uncertainty `R` has been giving us uncertainty estimates: .pull-left[ ```r Pollster08 <- read_csv("~/shared_data/stat100/data/Pollster08.csv") ggplot(Pollster08, aes(x = Days, y = Margin, color = factor(Charlie))) + geom_point() + stat_smooth(method = "lm", se = TRUE) + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="stat100_wk08wed_files/figure-html/polls-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### Quantifying Our Uncertainty `R` has been giving us uncertainty estimates: ```r modPoll <- lm(Margin ~ Days*factor(Charlie), data = Pollster08) library(moderndive) get_regression_table(modPoll) ``` ``` ## # A tibble: 4 × 7 ## term estimate std_error statistic p_value lower_ci upper_ci ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 intercept 5.57 1.09 5.11 0 3.40 7.73 ## 2 Days -0.598 0.121 -4.96 0 -0.838 -0.359 ## 3 factor(Charlie): 1 -10.1 1.92 -5.25 0 -13.9 -6.29 ## 4 Days:factor(Charlie)1 0.921 0.136 6.75 0 0.65 1.19 ``` --- ### Quantifying Our Uncertainty The [news and journal articles](https://www.pewresearch.org/fact-tank/2019/12/17/more-u-s-homeowners-say-they-are-considering-home-solar-panels/) are also giving us uncertainty estimates: .pull-left[ <img src="img/solar_panels.png" width="65%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/margin_of_error.png" width="85%" style="display: block; margin: auto;" /> ] --- ### Quantifying Our Uncertainty The [news and journal articles](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1740-9713.01501) are also giving us uncertainty estimates: <img src="img/ci_swimming.png" width="65%" style="display: block; margin: auto;" /> --- ### Statistical Inference **Goal**: Draw conclusions about the population based on the sample. -- **Main Flavors** → Estimating numerical quantities (parameters). -- → Testing conjectures. --- ### Estimation **Goal**: Estimate a (population) parameter. -- Best guess? → The corresponding (sample) statistic -- **Example**: Are GIFs just another way for people to share videos of their pets? <iframe src="https://giphy.com/embed/MCfhrrNN1goH6" width="280" height="240" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/easy-ear-MCfhrrNN1goH6">via GIPHY</a></p> -- Want to estimate the proportion of GIFs that feature animals. --- ### Estimation **Key Question**: How accurate is the statistic as an estimate of the parameter? -- **Helpful Sub-Question**: If we take many samples, how much would the statistic vary from sample to sample? -- Need two new concepts: -- * The **sampling variability** of a statistic -- * The **sampling distribution** of a statistic --- class: , center, middle ## Let's learn about these ideas through an activity! ## Go to [bit.ly/stat100gif](https://bit.ly/stat100gif). --- ## Sampling Distribution of a Statistic Steps to Construct an (Approximate) Sampling Distribution: 1. Decide on a sample size, `\(n\)`. -- 2. Randomly select a sample of size `\(n\)` from the population. -- 3. Compute the sample statistic. -- 4. Put the sample back in. -- 5. Repeat Steps 2 - 4 many (1000+) times. --- ## Sampling Distribution of a Statistic <img src="img/samp_dist.png" width="55%" style="display: block; margin: auto;" /> .pull-left[ * Center? Shape? * Spread? + Standard error = standard deviation of the statistic ] -- .pull-right[ **What happens to the center/spread/shape as we increase the sample size?** **What happens to the center/spread/shape if the true parameter changes?** ] --- ## Reminders: * Mid-term exam + Instructions page + No OHs starting Wednesday at noon. + Takehome + Oral + Many assessment forms in this course so you have many ways to showcase your understanding. + Reflect