Hypothesis Testing

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center,

## .base-blue[Hypothesis Testing]

### .purple[Kelly McConville]

#### .purple[ Stat 100 | Week 10 | Fall 2022]

]

---

### Announcements

* Don't forget about the weekly lecture quiz!

****************************

### Goals for Today

* Learn the **language** of hypothesis testing (including **p-values**)

* Practice framing research questions in terms of hypotheses

]

* Use `infer` to conduct hypothesis tests in `R`

* Learn how to generate **null distributions**

]

---

### Hypothesis Testing

#### Big Idea:

* Make an assumption about the population **parameter**.

* Generate a sampling distribution for a *test* statistic based on that assumption.
    + Called a **null distribution**

* See if the test statistic based on the observed sample aligns with the generated sampling distribution or not.

* If it does, then we didn't learn much.
    + (Didn't prove the parameter equals the assumed value but it is still plausible)

* If it doesn't, then we have evidence that our assumption about the parameter was wrong.

---

### ESP Example

**Big Idea:**

* Make an assumption about the population parameter.
 + ESP doesn't exist. p, probability of guessing correctly, equals 0.25.

* Generate a sampling distribution for a *test* statistic based on that assumption.
    + Called a **null distribution**

]

]

---

### ESP Example

**Big Idea:**

* See if the test statistic based on the observed sample aligns with the generated sampling distribution or not.
 + It is in the center-ish of the distribution. It isn't an unusual value.

* If it does, then we didn't learn much. (Didn't prove the parameter equals the assumed value but it is still plausible)
 + It is still possible that ESP doesn't exist.

]

]

---

### ESP Example

**Big Idea:**

* See if the test statistic based on the observed sample aligns with the generated sampling distribution or not.
 + It is far in the tails of the distribution. It is an unusual value.

* If it doesn't, then we have evidence that our assumption about the parameter was wrong. 
 + We have evidence ESP exists.

]

]

---

## Let's Take a Step Back from Our Last Statement...

* Two important words in data analysis: 
    + Reproducibility
    + Replicability

* **Reproducibility**: If I give you the raw data and my write-up, you will get to the exact same final numbers that I did.

* By using `RMarkdown` Documents, we are learning a **reproducible** workflow.

* **Replicability**: If you follow my study design but collect new data (i.e. repeat my study on new subjects), you will come to the same conclusions that I did.

* Science is going through a **replication crisis** right now.
    + [In cancer science, many "discoveries" don't hold up](https://www.reuters.com/article/us-science-cancer-idUSBRE82R12P20120328)
    + [Estimating the reproducibility of psychological science](https://science.sciencemag.org/content/349/6251/aac4716)
    + [Psychology Is Starting To Deal With Its Replication Problem](https://fivethirtyeight.com/features/psychology-is-starting-to-deal-with-its-replication-problem/)

* And, sadly, **replication** studies of Bern and Honorton's ESP trials typically failed to find evidence of ESP.

---

## Now Let's Start Learning the Language of Hypothesis Testing.
    
---

### Hypothesis Testing Framework

Have two competing hypothesis:

* Null Hypothesis `\((H_o)\)`: Dull hypothesis, status quo, random chance, no effect...

* Alternative Hypothesis `\((H_a)\)`: (Usually) contains the researchers' conjecture.

Must first take those hypotheses and translate them into statements about the population parameters so that we can test them with sample data!

#### Example:

`\(H_o\)`: ESP doesn't exist.

`\(H_a\)`: ESP does exist.

Then translate into a statistical problem:

`\(p\)` = probably of guessing correctly out of 4 images

`\(H_o\)`: `\(p = 0.25\)` (or `\(p \leq 0.25\)` )

`\(H_a\)`: `\(p > 0.25\)`

---

## Let's Practice Framing the Hypotheses.

---

### Example 1

In 2005, the researchers, Antonioli and Reveley, poised the question "Does swimming with the dolphins help depression?"  To study this question, they recruited 30 subjects with clinical depression whose ages ranged from 18 to 65 years old.  Each subject discontinued any other treatment four weeks prior to the experiment and were randomly assigned to either swim with dolphins (the treatment group) or to do yoga (the control group).  After two weeks, each subject was categorized as “showed substantial improvement” or “did not show substantial improvement”.

#### Write out Ho and Ha in terms of conjectures.

#### Write out Ho and Ha in terms of population parameters.  (Make sure to first define the population parameter in the context of the problem.)

---

### Example 2

Let’s return to this example: Can a simple smile have an effect on punishment assigned following an infraction? In a 1995 study, Hecht and LeFrance examined the effect of a smile on the leniency of disciplinary action for wrongdoers. Participants in the experiment took on the role of members of a college disciplinary panel judging students accused of cheating. For each suspect, along with a description of the offense, a picture was provided with either a smile or neutral facial expression. A leniency score was calculated based on the disciplinary decisions made by the participants.

#### Write out Ho and Ha in terms of conjectures.

#### Write out Ho and Ha in terms of population parameters.  (Make sure to first define the population parameter in the context of the problem.)

---

### Example 3

Can you tell if a mouse is in pain by looking at its facial expression? A recent study created a ‘‘mouse grimace scale” and tested to see if there was a positive correlation between scores on that scale and the degree and duration of pain (based on injections of a weak and mildly painful solution). The study’s authors believe that if the scale applies to other mammals as well, it could help veterinarians test how well painkillers and other medications work in animals.

#### Write out Ho and Ha in terms of conjectures.

#### Write out Ho and Ha in terms of population parameters.  (Make sure to first define the population parameter in the context of the problem.)

---

### Hypothesis Testing Framework

Flavors of hypotheses:

* `\(H_o\)`: parameter `\(=\)` null value

* One of the following:
 + `\(H_a\)`: parameter `\(\neq\)` null value 
 + `\(H_a\)`: parameter `\(>\)` null value 
 + `\(H_a\)`: parameter `\(<\)` null value 
 
--

**Question**: But doesn't `\(H_o\)` sometimes represent `\(\leq\)` or `\(\geq\)`?

---

### Hypothesis Testing Framework

Once you have set-up your hypotheses...

* Collect data.

* Assume `\(H_o\)` is correct.

* Quantify the likelihood of the sample results using a test statistic.

* **Test statistic**:  Numerical summary of the sample data
    + Often is equal to the sample statistic.
    
---

### Hypothesis Testing Framework -- Null Distribution

**Null distribution**: Sampling distribution of the test statistic if the null hypothesis is true.

**Question**: How do we use the null distribution to quantify the likelihood of the sample results?

**p-value** = Probability of the observed test statistic or more extreme if `\(H_o\)` is true

* More extreme = direction of `\(H_a\)`

* Find the proportion of test statistics in the null distribution that are equal to or more extreme that the observed test statistic
    + Let's draw some pictures.
    
--

* If the p-value is small, we have evidence for `\(H_a\)`.
    + Notice I am talking about `\(H_a\)`, not `\(H_o\)` here!

* If the p-value is not small, we don't have evidence for `\(H_a\)`.

---

## Generating Null Distributions

**For the sample proportion in the ESP Example:**

#### Steps:

1. Flip unfair coin (prop heads = 0.25) 329 times.
2. Compute proportion of heads.
3. Repeat 1 and 2 many times.

`R` code using the `infer` package:

```r
null_dist <- esp %>%
 specify(response = guess, success = "correct") %>%
 hypothesize(null = "point", p = 0.25) %>%
 generate(reps = 1000, type = "draw") %>%
 calculate(stat ="prop")
```

For different variable types, we need to move beyond using a coin to conceptualize the null distribution.

---

### Let's return to the ESP example but now using `infer`.  The "hypothesisTestingFramework.Rmd" file can be found in the Handouts folder.

### Let's go through the first example.

---

###  Returning to Example 1

Here's a contingency table of `improve` and `group`.

```r
dolphins %>%
  count(group, improve)
```

```
##       group improve  n
## 1   Control      no 12
## 2   Control     yes  3
## 3 Treatment      no  5
## 4 Treatment     yes 10
```

#### How can we generate the null distribution for this scenario?

#### Once you have your simulated null statistic, add it to the class dotplot on the board.

---

### Let's look at the second example of the "hypothesisTestingFramework.Rmd".

---

## Reminders:

* Don't forget about the weekly lecture quiz!