Sampling Distributions

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center,

## .base-blue[Sampling Distributions]

<br>

### .purple[Kelly McConville]

#### .purple[ Stat 100 | Week 8 | Fall 2022]

]

---

### Announcements

* Mid-term exam
    + Instructions page
    + No OHs starting Wednesday at noon.
    + Takehome + Oral
    + Many assessment forms in this course so you have many ways to showcase your understanding.
    + Reflect

****************************

### Goals for Today

* Discuss the ❤️ of statistical inference

]

* **Sampling Distribution**
    + Creation
    + Properties

]

---

### Which Type Are You?

### Data Visualizer

]

### Data Wrangler

]

---

### Which Type Are You?

### Model Builder

]

### A Mix!

]

---

---

### The ❤️ of statistical inference is quantifying uncertainty

```r
library(tidyverse)
ce <- read_csv("~/shared_data/stat100/data/ce.csv")
summarize(ce, meanFINCBTAX = mean(FINCBTAX))
```

```
## # A tibble: 1 × 1
##   meanFINCBTAX
##          <dbl>
## 1       62480.
```

---

### The ❤️ of statistical inference is quantifying uncertainty

```r
library(tidyverse)
ce <- read_csv("~/shared_data/stat100/data/ce.csv")
summarize(ce, meanFINCBTAX = mean(FINCBTAX))
```

```
## # A tibble: 1 × 1
##   meanFINCBTAX
##          <dbl>
## 1       62480.
```

#### Distinguishing between the population and the sample

* **Parameters**: 
    + Based on the **population**
    + Unknown then if don't have data on the whole population
    + EX: `\(\beta_o\)` and `\(\beta_1\)`
    + EX: `\(\mu\)` = population mean

]

* **Statistics**: 
    + Based on the **sample** data
    + Known
    + Usually estimate a population parameter
    + EX: `\(\hat{\beta}_o\)` and `\(\hat{\beta}_1\)` 
    + EX: `\(\bar{x}\)` = sample mean

]

---

### Quantifying Our Uncertainty

`R` has been giving us uncertainty estimates:

```r
Pollster08 <- 
  read_csv("~/shared_data/stat100/data/Pollster08.csv")

ggplot(Pollster08, aes(x = Days,
                       y = Margin, 
                       color = factor(Charlie))) +
  geom_point() +
  stat_smooth(method = "lm", se = TRUE) +
  theme(legend.position = "bottom")
```

]

]

---

### Quantifying Our Uncertainty

`R` has been giving us uncertainty estimates:

```r
modPoll <- lm(Margin ~ Days*factor(Charlie), data = Pollster08)
library(moderndive)
get_regression_table(modPoll)
```

```
## # A tibble: 4 × 7
##   term                  estimate std_error statistic p_value lower_ci upper_ci
##   <chr>                    <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept                5.57      1.09       5.11       0    3.40     7.73 
## 2 Days                    -0.598     0.121     -4.96       0   -0.838   -0.359
## 3 factor(Charlie): 1     -10.1       1.92      -5.25       0  -13.9     -6.29 
## 4 Days:factor(Charlie)1    0.921     0.136      6.75       0    0.65     1.19
```

---

### Quantifying Our Uncertainty

The [news and journal articles](https://www.pewresearch.org/fact-tank/2019/12/17/more-u-s-homeowners-say-they-are-considering-home-solar-panels/) are also giving us uncertainty estimates:

]

]

---

### Quantifying Our Uncertainty

The [news and journal articles](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1740-9713.01501) are also giving us uncertainty estimates:

---

### Statistical Inference

**Goal**: Draw conclusions about the population based on the sample.

**Main Flavors**

&rarr; Estimating numerical quantities (parameters).

&rarr; Testing conjectures.

---

### Estimation

**Goal**: Estimate a (population) parameter.

Best guess?

&rarr; The corresponding (sample) statistic

**Example**: Are GIFs just another way for people to share videos of their pets?

Want to estimate the proportion of GIFs that feature animals.

---

### Estimation

**Key Question**: How accurate is the statistic as an estimate of the parameter?

**Helpful Sub-Question**: If we take many samples, how much would the statistic vary from sample to sample?

Need two new concepts:

* The **sampling variability** of a statistic

* The **sampling distribution** of a statistic

---

## Let's learn about these ideas through an activity!

## Go to [bit.ly/stat100gif](https://bit.ly/stat100gif).

---

## Sampling Distribution of a Statistic

Steps to Construct an (Approximate) Sampling Distribution:

1. Decide on a sample size, `\(n\)`.

2. Randomly select a sample of size `\(n\)` from the population.

3. Compute the sample statistic.

4.  Put the sample back in.

5. Repeat Steps 2 - 4 many (1000+) times.

---

## Sampling Distribution of a Statistic

* Center? Shape?

* Spread?
    + Standard error = standard deviation of the statistic

]

**What happens to the center/spread/shape as we increase the sample size?**

**What happens to the center/spread/shape if the true parameter changes?**

]

---

## Reminders: