Logistic Regression

background-image: url("img/DAW.png")
background-position: left
background-size: 50%
class: middle, center, inverse

.pull-right[

## .whitish[Expanding our Modeling]

## .whitish[Toolkit:]

## .whitish[Logistic Regression]

<br>

### .whitish[Kelly McConville]

#### .yellow[ Stat 100 | Week 13 | Spring 2022]

]

---

background-image: url("img/ggparty.003.jpeg")
background-position: contain
background-size: 90%

---

class: middle, center

# `ggparty` Rain Location: Science Center 316

---

### Announcements

* We are currently finalizing [the OH schedule](https://docs.google.com/spreadsheets/d/18ckvWMtKWJrpq6YS-FuToAMrbHPP722FF79ZvuZ01Ys/edit?usp=sharing) for the next couple of weeks.
* Will receive the final exam review sheet in Wednesday's lecture.
    + Along with link to sign-up for oral exam time slot.
* Final Project Assignment due Friday, May 6th.
* No sections this week!

****************************

### Goals for Today

.pull-left[

* Revisit model selection.

* Motivate Logistic Regression.

]

.pull-right[

* For the Logistic Regression Model, consider
    + Fitting the model in `R`.
    + Interpreting the coefficients.
    + Assessing the quality of the model.

]

---

### Comparing Models with the Adjusted `$R^2$`

**Strategy**: Compute the adjusted `$R^2$` value for each model and pick the one with the highest adjusted `$R^2$`.

`\begin{align*}
\mbox{adj} R^2 &= 1- \frac{\sum (y - \hat{y})^2}{\sum (y - \bar{y})^2} \left(\frac{n - 1}{n - p - 1} \right)
\end{align*}`

Let's look at one more example.

---

### Comparing Models with the Adjusted `$R^2$`

---

### Comparing Models with the Adjusted `$R^2$`

```r
mod1 <- lm(Margin ~ Days*factor(Charlie), data = Pollster08)
mod2 <- lm(Margin ~ Days*factor(Meltdown), data = Pollster08)
mod3 <- lm(Margin ~ poly(Days, degree = 2, 
                         raw = TRUE), data = Pollster08)
mod4 <- lm(Margin ~ poly(Days, degree = 20, 
                         raw = TRUE), data = Pollster08)
```

`\begin{align*}
\mbox{adj} R^2 &= 1- \frac{\sum (y - \hat{y})^2}{\sum (y - \bar{y})^2} \left(\frac{n - 1}{n - p - 1} \right)
\end{align*}`

Which of these 4 models likely has the highest adjusted R2?  Which likely has the lowest?

---

```r
library(broom)
glance(mod1)
```

```
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.417         0.399  2.87      23.4 1.71e-11     3  -250.  510.  523.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

```r
glance(mod2)
```

```
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic      p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>        <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.324         0.303  3.09      15.7 0.0000000216     3  -258.  525.  539.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

---

```r
glance(mod3)
```

```
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.349         0.336  3.01      26.6 5.71e-10     2  -256.  519.  530.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

```r
glance(mod4)
```

```
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic     p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>       <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.486         0.396  2.88      5.41 0.000000158    15  -244.  522.  566.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

---

### Logistic Regression

**Response variable**: A **categorical** variable with **two** categories

`\begin{align*}
Y =   \left\{
\begin{array}{ll}
      1 & \mbox{success} \\
      0 & \mbox{failure} \\
\end{array} 
\right.  
\end{align*}`

`$Y \sim$` Bernoulli `$(p)$` where `$p = P(Y = 1) = P(\mbox{success})$`.

**Explanatory variables**: Can be a mix of categorical and quantitative

**Goal**: Build a model for `$P(Y = 1)$`.

---

### Why can't we use linear regression?

.pull-left[

* Regression line = estimated **probability** of success

* For valid values of `$x$`, we are predicting the probability is less than 0 or greater than 1.

]

.pull-right[

]

---

### Why can't we use linear regression?

.pull-left[

* Regression line = estimated **probability** of success

* For valid values of `$x$`, we are predicting the probability is less than 0 or greater than 1.

* The estimated probabilities based on the logistic regression model are bounded between 0 and 1.

]

.pull-right[

]

What does the **logistic regression model** look like?

---

### Logistic Regression Model

`\begin{align*}
\log\left(\frac{P(Y = 1)}{1  - P(Y = 1)}  \right) &= \beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p 
\end{align*}`

Left hand side has many interpretations:

`\begin{align*}
\mbox{LHS} &= \log\left(\frac{P(Y = 1)}{1  - P(Y = 1)}  \right)\\ 
&= \log \left( \mbox{odds (of success)}  \right)\\
&= \mbox{logit}(P(Y = 1))
\end{align*}`

Note:

$$ 
\mbox{odds} = \frac{\mbox{prob of success}}{\mbox{prob of failure}}
$$
--

The odds will be important when we go to interpret the coefficients!

---

### Probability of Success

#### Logistic Regression Model:

`\begin{align*}
\log\left(\frac{P(Y = 1)}{1  - P(Y = 1)}  \right) &= \beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p 
\end{align*}`

But I don't want an equation for the log odds!

Want an equation for `$P(Y = 1)$`!

Take

`\begin{align*}
\log\left(\frac{P(Y = 1)}{1  - P(Y = 1)}  \right) &= \beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p 
\end{align*}`

and solve for only `$P(Y = 1)$` on the LHS:

`\begin{align*}
P(Y = 1)  = 
\frac{\exp(\beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p )}{1  + \exp(\beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p )}
\end{align*}`

---

### Probability of Success

Have:

`\begin{align*}
P(Y = 1)  = 
\frac{\exp(\beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p )}{1  + \exp(\beta_o + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p )}
\end{align*}`

Notes:

&rarr; Bounded by 0 and 1.

&rarr; Estimate `$\beta$`'s with a method called "Maximum Likelihood Estimation".

**Fitted Model**:

`\begin{align*}
\widehat{P(Y = 1)} = 
\frac{\exp(\hat{\beta}_o + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \cdots + \hat{\beta}_p x_p )}{1  + \exp(\hat{\beta}_o + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \cdots + \hat{\beta}_p x_p )}
\end{align*}`

---

### Example: Palmer Penguins

.pull-left[

**Goal**: Build a model that predicts whether a penguin is a **Chinstrap** or a **Adelie**.

]

.pull-right[

```r
library(palmerpenguins)

penguins2 <- penguins %>%
  filter(species != "Gentoo") %>%
  mutate(species_num = 
           if_else(species == "Chinstrap",
                   1, 0)) %>%
  drop_na(species_num, flipper_length_mm)

count(penguins2, species)
```

```
## # A tibble: 2 × 2
##   species       n
##   <fct>     <int>
## 1 Adelie      151
## 2 Chinstrap    68
```

]

---

### Model: Predict species with flipper length

.pull-left[

```r
ggplot(data = penguins2, 
       mapping = aes(x = flipper_length_mm,
                     y = species_num)) +
  geom_jitter(size = 3, alpha = 0.4,
              width = 0, height = 0.05) + 
  labs(y = "P(Y = 1)") 
```

]

.pull-right[

]

---

### Model: Predict species with flipper length

.pull-left[

```r
ggplot(data = penguins2, 
       mapping = aes(x = flipper_length_mm,
                     y = species_num)) +
  geom_jitter(size = 3, alpha = 0.4,
              width = 0, height = 0.05) + 
  labs(y = "P(Y = 1)") +
  geom_smooth(method = 'glm',
              method.args = 
                list(family = "binomial"))
```

]

.pull-right[

]

**Question**: How do we fit the model (the blue curve) in `R`?

---

### Model: Predict species with flipper length

```r
mod <- glm(species ~ flipper_length_mm, family = "binomial",
           data = penguins2)
library(broom)
tidy(mod)
```

```
## # A tibble: 2 × 5
##   term              estimate std.error statistic     p.value
##   <chr>                <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)        -25.8      4.90       -5.26 0.000000141
## 2 flipper_length_mm    0.129    0.0253      5.12 0.000000299
```

But what do the coefficient estimates `$\hat{\beta}_o = -25.8$` and `$\hat{\beta}_1 = 0.129$` represent?

---

### Interpretation of Coefficients

#### Define two concepts:

(1) Odds:

$$ 
\mbox{odds} = \frac{\mbox{prob of success}}{\mbox{prob of failure}} = \frac{P(Y = 1)}{1  - P(Y = 1)} 
$$
--

Recall:

`\begin{align*}
\log\left(\frac{P(Y = 1)}{1  - P(Y = 1)}  \right) &= \beta_o + \beta_1 x_1 
\end{align*}`

So then

$$ 
\mbox{odds} = \exp \left( \beta_o + \beta_1 x_1 \right)
$$

---

### Interpretation of Coefficients

#### Define two concepts:

(2) Odds ratio: Comparison of odds between two groups

$$ 
\mbox{odds ratio} = \frac{\mbox{odds of group 1}}{\mbox{odds of group 2}}
$$

**Interpretation of odds ratio:** The odds of success in group 1 are _insert #_ times the odds of success in group 2.

**Note:** Also useful property:

$$
\exp(a + b) = \exp(a) \exp(b)
$$

---

### Interpretation of Coefficients

(estimated) odds for `$x_1 = t$` mm:

`$$\exp \left( \hat{\beta}_o + \hat{\beta}_1 t \right)$$`

(estimated) odds for `$x_1 = t + 1$` mm:

`$$\exp \left( \hat{\beta}_o + \hat{\beta}_1 (t + 1) \right)$$`

odds ratio:

`\begin{align*}
\frac{\exp \left( \hat{\beta}_o + \hat{\beta}_1 (t + 1) \right)}{\exp \left( \hat{\beta}_o + \hat{\beta}_1 t \right)} &= \frac{\exp \left( \hat{\beta}_o\right)\exp \left( \hat{\beta}_1 (t + 1) \right)}{\exp \left( \hat{\beta}_o\right)\exp \left( \hat{\beta}_1 t \right)} \\
& = \exp \left( \hat{\beta}_1 \right)
\end{align*}`

---

### Interpretation of Coefficients

How do we interpret `$\hat{\beta}_1 = 0.129$`?

```r
tidy(mod)
```

$$
\exp \left( \hat{\beta}_1 \right) = \frac{\mbox{odds of being Chinstrap for flipper length t + 1 mm}}{\mbox{odds of being Chinstrap for flipper length t mm}}
$$

```r
exp(0.129)
```

```
## [1] 1.13769
```

The odds that a penguin in Chinstrap instead of Adelie is 1.138 times the odds of a penguin whose flipper is 1 mm shorter.

---

### Prediction

How can I use this model to now predict, for a given flipper_length, whether a penguin is a Chinstrap or an Adelie?

.pull-left[

]

---

### Prediction

How can I use this model to now predict, for a given flipper_length, whether a penguin is a Chinstrap or an Adelie?

.pull-left[

]

.pull-right[

`\begin{align*}
\widehat{P(Y = 1)} \geq 0.5 &\rightarrow \hat{y} = 1 \\
\widehat{P(Y = 1)} < 0.5 &\rightarrow \hat{y} = 0 \\
\end{align*}`

]

---

### Prediction Accuracy

```r
penguins2 <- penguins2 %>%
  mutate(pred = mod$fitted.values)

penguins2 %>%
  select(species, pred)
```

```
## # A tibble: 219 × 2
##    species   pred
##    <fct>    <dbl>
##  1 Adelie  0.0886
##  2 Adelie  0.157 
##  3 Adelie  0.373 
##  4 Adelie  0.315 
##  5 Adelie  0.238 
##  6 Adelie  0.0886
##  7 Adelie  0.373 
##  8 Adelie  0.315 
##  9 Adelie  0.238 
## 10 Adelie  0.157 
## # … with 209 more rows
```

---

### Prediction Accuracy

```r
penguins2 <- penguins2 %>%
  mutate(pred_species = case_when(
    pred >= 0.5 ~ "Chinstrap",
    pred < 0.5 ~ "Adelie"
  ))

penguins2 %>%
  count(species, pred_species)
```

```
## # A tibble: 4 × 3
##   species   pred_species     n
##   <fct>     <chr>        <int>
## 1 Adelie    Adelie         138
## 2 Adelie    Chinstrap       13
## 3 Chinstrap Adelie          46
## 4 Chinstrap Chinstrap       22
```

```r
(138 + 22)/219
```

```
## [1] 0.7305936
```

---

### Multiple Logistic Regression

.pull-left[

Another good explanatory variable: `bill_length_mm`

```r
mod2 <- glm(species ~ flipper_length_mm 
            + bill_length_mm,
            family = "binomial",
            data = penguins2)
tidy(mod2)
```

```
## # A tibble: 3 × 5
##   term              estimate std.error statistic      p.value
##   <chr>                <dbl>     <dbl>     <dbl>        <dbl>
## 1 (Intercept)        -21.8     11.2        -1.95 0.0515      
## 2 flipper_length_mm   -0.171    0.0740     -2.32 0.0205      
## 3 bill_length_mm       1.25     0.228       5.47 0.0000000451
```

]

.pull-right[

]

---

### Multiple Logistic Regression

```r
penguins2 <- penguins2 %>%
  mutate(pred_mod2 = mod2$fitted.values) %>%
  mutate(pred_species2 = case_when(
    pred_mod2 >= 0.5 ~ "Chinstrap",
    pred_mod2 < 0.5 ~ "Adelie"
  ))
```

```r
penguins2 %>%
  count(species, pred_species2)
```

```
## # A tibble: 4 × 3
##   species   pred_species2     n
##   <fct>     <chr>         <int>
## 1 Adelie    Adelie          148
## 2 Adelie    Chinstrap         3
## 3 Chinstrap Adelie            6
## 4 Chinstrap Chinstrap        62
```

* Like with linear regression, still need to be careful that you are finding the general trend and not just over-fitting the sample data!

---

class: inverse, middle, center

## What other modeling techniques are out there beyond regression?!