P-Value Pitfalls

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center,

.pull-right[

## .base-blue[P-Value Pitfalls]

<br>

### .purple[Kelly McConville]

#### .purple[ Stat 100 | Week 11 | Fall 2022]

]

---

### Announcements

* 🎉 We are still accepting Course Assistant/Teaching Fellow applications for Stat 100 for Spring 2023. To apply, fill out [this application](https://docs.google.com/forms/d/e/1FAIpQLScwKJaRfppRqXAzyxMMCeBUdwrzBudNONt0S9dc8lE2ZUlQwQ/viewform) by November 15th.
    + About 9-12 hours of work per week.  
    + Primary responsibilities: Lead a discussion section, hold office hours, grade assessments.

****************************

### Goals for Today

.pull-left[

* Ethical Guidelines: "Integrity of Data and Methods"

* A hearty p-values discussion

]

.pull-right[

* Key probability concepts

]

---

class: , center, middle

## Ethics

#### Let's return to the ASA's ["Ethical Guidelines for Statistical Practice"](https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx).

##  Integrity of Data and Methods

*"The ethical statistical practitioner seeks to understand and mitigate known or suspected limitations, defects, or biases in the data or methods and communicates potential impacts on the interpretation, conclusions, recommendations, decisions, or other results of statistical practices. "*

---

##  Integrity of Data and Methods

*"Communicates the stated purpose and the intended use of statistical practices. Is transparent regarding a priori versus post hoc objectives and planned versus unplanned statistical practices. Discloses when multiple comparisons are conducted and any relevant adjustments."*

* Important ideas for us today and on this week's p-set:
    + A priori versus post hoc objectives
    + Planned versus unplanned statistical practices
    + Multiple comparisons

---

### Let's Talk About P-values

* The original intention of the p-value was as an **informal** measure to judge whether or not a researcher should take a second look.

* But to create simple statistical manuals for practitioners, the rule quickly became "p-value < 0.05" = "statistically significant".

**What were/are the consequences of "p-value < 0.05" = "statistically significant"?**

* **A consequence**: The p-value is often misinterpreted to be the probability the null hypothesis is true.  
    + A p-value of 0.003 does not mean there's a 0.3% chance that ESP doesn't exist!
    
---

### Let's Talk About P-values

.pull-left[

* **A consequence**: Researchers often put too much weight on the p-value and not enough weight on their domain knowledge/the plausibility of their conjecture.

* [xkcd comic](https://xkcd.com/1132/)

]

.pull-right[

]

---

.pull-left[

### Let's Talk About P-values

* **A consequence**: [P-hacking](https://projects.fivethirtyeight.com/p-hacking/): Cherry-picking promising findings

* [xkcd comic](https://www.explainxkcd.com/wiki/index.php/882:_Significant)

]

.pull-right[

]

---

### Let's Talk About P-values

* **A consequence**: People conflate *statistical significance* with *practical significance*.

**Example**: A recent *Nature* study of 19,000+ people found that those who meet their spouses online...

&rarr; Are less likely to divorce (p-value < 0.002)

&rarr; Are more likely to have high marital satisfaction (p-value < 0.001)

BUT the estimated **effect sizes** were tiny:

(Recall: The effect size is the difference between true value of the parameter and null value.)

* Divorce rate of 5.96% for those who met online versus 7.67% for those who met in-person.

* On a 7 point scale, happiness value of 5.64 for those who met online versus 5.48 for those who met in-person.

**Question**: Do these results provide compelling evidence that one should change their behavior?

---

### Let's Talk About P-values

The American Statistical Association created a set of principles to address misconceptions and misuse of p-values:

(1) P-values can indicate how incompatible the data are with a specified statistical model.

(2) P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

(3) Scientific conclusions and business or policy decisions should not be based only on whether or not a p-value passes a specific threshold (i.e. 0.05).

(4) Proper inference requires full reporting and transparency.

(5) A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

(6) By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

---

### Let's Talk About P-values

* Despite its issues, p-values are still quite popular and can still be a useful tool **when used properly**.

* In 2014, George Cobb a professor from Mount Holyoke College poised the following questions (and answers):

* Understanding p-values and being able to **interpret a p-value in context** is a learning objective of Stat 100.
    + Ex: If ESP doesn't exist, the probability of guessing correctly on at least 106 out of 329 trials is 0.003.

* Understanding that a small p-value means evidence for `$H_a$` is important.
    + Ex: Because the p-value is small, we have evidence for ESP.

* Understanding that what you mean by **small** should depend on your field and whether a Type I Error or Type II Error is worse for **your particular research question**.

--
    
* Your ability to tell if a # is less than 0.05 is not a learning objective for Stat 100.

---

background-image: url("img/ci_diagram.png")
background-position: contain
background-size: 70%

### Statistical Inference Zoom Out -- Estimation

**Question**: How did folks do inference before computers?

---

background-image: url("img/hyp_testing_diagram.png")
background-position: contain
background-size: 80%

### Statistical Inference Zoom Out -- Testing

**Question**: How did folks do inference before computers?

---

class: , middle, center

## This means we need to learn about probability models!

---

### Probability Models

*"All models are wrong but some are useful."*  -- George Box

.pull-left[

**Question**: How can we use theoretical models to approximate our distributions?

]

.pull-right[

Before we can answer that question, we need to learn some probability concepts that will help us understand these models.

]

---

### Probability Concepts

**Random process**: outcomes is uncertain.

* EX: Roll 6 sided die.

The **probability** of an outcome is the "long-run proportion" of times the outcome occurs.

* EX: Want probability of rolling the #5
    + Let `$p_n$` = proportion of rolls that are 5 in n rolls
    + Let `$p$` = probability of rolling 5 = `$P$`(roll 5)

**Law of Large Numbers** (LLN) says that as `$n$` increases, `$p_n$` converges to `$p$`.

---

### Probability Concepts

.pull-left[

**Question**: Why is the LLN important to us?

]

.pull-left[

**Question**: How have we been computing p-values?

]

.pull-left[

$$
\mbox{p-value} = \frac{\mbox{# of extreme test stats}}{\mbox{# of replications}}
$$

LLN tells us the proportion of extreme test stats is roughly equal to the true probability of observing the test statistic or more extreme under `$H_o$`.

]

.pull-right[

]

---

### Probabilities: `$P(\mbox{event})$`

* Probability of event = long-run proportion of event

#### Useful properties of probabilities:

(1)  `$0 \leq P(\mbox{event}) \leq 1$`

.pull-left[

(2) If two events are disjoints (have no outcomes in common), then

$$
P(\mbox{event 1 or event 2}) = P(\mbox{event 1}) + P(\mbox{event 1}).
$$

]

.pull-right[

<img src="stat100_wk11wed_files/figure-html/unnamed-chunk-6-1.png" width="576" style="display: block; margin: auto;" />
We use this fact when we find a two-sided p-value.

]

---

### Probabilities: `$P(\mbox{event})$`

#### Useful properties of probabilities:

.pull-left[

(3) Complement Rule

$$
P(\mbox{event}) = 1 - P(\mbox{not that event}) = 1 - P(\mbox{event}^c)
$$

Sometimes it is "easier" to find the complement event's probability.

]

.pull-right[

]

---

### Random Variables

**Random variable** (RV) is a random process that **takes on numerical values**.

* Discrete RV: Takes on discrete values (countable number of possible values)
    + EX: 0, 1, 2, 3, ...

* Continuous RV: Can take on any value in a interval

* Random variables have **probability functions** that tell us the likelihood of specific values.

* For discrete RV, probability function is:

$$
p(x) = P(X = x)
$$
where `$\sum p(x) = 1$`.

* Example: X = # when you roll die

---

### Random Variables

For a random variable, care about its:

* Probability function: `$p(x) = P(X = x)$`

* Center: Mean of a RV:

$$
\mu = \sum x p(x)
$$

* Spread: Variance of a RV:

$$
\sigma^2 = \sum (x - \mu)^2 p(x)
$$
* And, standard deviation of a RV:

$$
\sigma = \sqrt{ \sum (x - \mu)^2 p(x)}
$$

**Example**: What is the mean and variance for `$X$` = # when you roll die?

**Question**: How do these measures relate to `$\bar{x}$` and `$s^2$`?

---

### Another Example:

Suppose 4 students have still not received their graded Stat 100 Midterm (yes, let's pretend we actually have hand-written work) and that I hand back the exams randomly to each student.  Let X = the number of students who get their correct exam.

**Questions:**

* Let's say the student's names are A(licia), B(ob), C(olin), and D(onna) and they are sitting in a row ABCD.  One possible outcome is ABDC (1st exam goes to A, 2nd to B, 3rd to D, 4th to C).  In that case, what does X equal?

* List out all possible outcomes.  And for each outcome, determine what X equals.

* Why is P(X = 3) = 0?

* Write out the probability distribution for X.

* Determine the mean value of X.

* Determine the standard deviation of X.

* What is the probability that at least one student gets their correct exam?

---

## Reminders:

* 🎉 We are now accepting Course Assistant/Teaching Fellow applications for Stat 100 for Spring 2023. To apply, fill out [this application](https://docs.google.com/forms/d/e/1FAIpQLScwKJaRfppRqXAzyxMMCeBUdwrzBudNONt0S9dc8lE2ZUlQwQ/viewform) by November 15th.
    + About 9-12 hours of work per week.  
    + Primary responsibilities: Lead a discussion section, hold office hours, grade assessments.