Confidence Intervals

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center,

.pull-right[

## .base-blue[Confidence Intervals]

### .purple[Kelly McConville]

#### .purple[ Stat 100 | Week 9 | Fall 2022]

]

---

### Announcements

* Project Assignment 2 due on Friday at 5pm.

* 🎉 We are now accepting Course Assistant/Teaching Fellow applications for Stat 100 for Spring 2023. To apply, fill out [this application](https://docs.google.com/forms/d/e/1FAIpQLScwKJaRfppRqXAzyxMMCeBUdwrzBudNONt0S9dc8lE2ZUlQwQ/viewform) by November 15th.
    + About 9-12 hours of work per week.  
    + Primary responsibilities: Lead a discussion section, hold office hours, grade assessments.

****************************

### Goals for Today

.pull-left[

* Guest speaker on modeling: Dr. Paolo Maranzano, University of Milano-Bicocca

]

.pull-right[

* Bootstrapping for estimation

* Meaning of the word **confidence**

* Interpreting confidence intervals

]

---

## Confidence Intervals

**95% CI Form**:

$$
\mbox{statistic} \pm 2\mbox{SE}
$$

Let's use the `ce` data to produce a CI for the average household income before taxes.

```r
summarize(ce, meanFINCBTAX = mean(FINCBTAX))
```

```
## # A tibble: 1 × 1
## meanFINCBTAX
## <dbl>
## 1 62480.
```

What else do we need to construct the CI?

**Problem**: To compute the SE, we need many samples from the population.  We have 1 sample.

**Solution**: Approximate the sampling distribution using **ONLY OUR ONE SAMPLE!**

---

### Bootstrap Distribution

How do we approximate the sampling distribution?

.pull-left[

Steps for Generating a **Bootstrap Distribution of a Sample Statistic**:

1. Take a sample of size `$n$` with replacement from the sample.
    + Called a bootstrap sample.

2. Compute the statistic.

3. Repeat 1 and 2 many times.

]

---

### Let's Practice Generating Bootstrap Samples!

**Example:** In a recent study, 23 rats showed compassion that surprised scientists. Twenty-three of the 30 rats in the study freed another trapped rat in their cage, even when chocolate served as a distraction and even when the rats would then have to share the chocolate with their freed companion. (Rats, it turns out, love chocolate.) Rats did not open the cage when it was empty or when there was a stuffed animal inside, only when a fellow rat was trapped. We wish to use the sample to estimate the proportion of rats that show empathy in this way.

**Parameter**:

**Statistic**:

You have 30 cards.  How can you use these to take a bootstrap sample?

* For each sample, compute the bootstrap statistic and put it on the class dotplot.

(Will use these data for one of the problems in P-Set 6.)

---

### Sampling Distribution Versus Bootstrap Distribution

* Data needed:

* Center:

* Spread:

---

### (Bootstrapped) Confidence Intervals

**95% CI Form**:

$$
\mbox{statistic} \pm 2\mbox{SE}
$$

We approximate `$\mbox{SE}$` with `$\widehat{\mbox{SE}}$` = the standard deviation of the bootstrapped statistics.

Caveats:

* Assuming a random sample

* Even with random samples, sometimes we get non-representative samples.  Bootstrapping can't fix that.

* Assuming the bootstrap distribution is bell-shaped and symmetric

---

### Bootstrapped Confidence Intervals

#### Two Methods

Assuming random sample and roughly bell-shaped and symmetric bootstrap distribution for both methods.

**SE Method 95% CI**:

$$
\mbox{statistic} \pm 2\widehat{\mbox{SE}}
$$

We approximate `$\mbox{SE}$` with `$\widehat{\mbox{SE}}$` = the standard deviation of the bootstrapped statistics.

**Percentile Method CI:**

If I want a P% confidence interval, I find the bounds of the middle P% of the bootstrap distribution.

---

class: , center, middle

### Let's go through the confidenceIntervals.Rmd handout.

---

.pull-left[

]

.pull-right[

### [What do we mean by confidence?](http://www.rossmanchance.com/applets/2021/confsim/ConfSim.html)

* Confidence level = success rate of the method under **repeated sampling**

* How do I know if my ONE CI successfully contains the true value of the parameter?

* As we increase the **confidence level**, what happens to the width of the interval?

* As we increase the **sample size**, what happens to the width of the interval?

* As we increase the **number of bootstrap samples** we take, what happens to the width of the interval?

]

---

## Interpreting Confidence Intervals

#### Example: Estimating average household income before taxes in the US

.pull-left[

SE Method Formula:

$$
\mbox{statistic} \pm{\mbox{ME}}
$$

```
## # A tibble: 1 × 1
## meanFINCBTAX
## <dbl>
## 1 62480.
```

```
## # A tibble: 1 × 3
## ME lower upper
## <dbl> <dbl> <dbl>
## 1 1871. 60609. 64351.
```

]

.pull-right[

*"The margin of [sampling] error can be described as the 'penalty' in precision for not talking to everyone in a given population. It describes the range that an answer likely falls between if the survey had reached everyone in a population, instead of just a sample of that population."* -- Courtney Kennedy, Director of Survey Research at Pew Research Center

CI = interval of **plausible** values for the **parameter**

]

#### Safe interpretation:

I am P% confident that {insert what the parameter represents in context}  is between {insert lower bound} and {insert upper bound}.

---

## Caution: Confidence intervals in the wild

Statement in [an article](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/) for The BMJ (British Medical Journal):

---

class: , middle, center

## 🤔 The second half of Stat 100 is more conceptually difficult. 🤔

---

## Reminders:

* Project Assignment 2 due on Friday at 5pm.