background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[Control Flow] <br> #### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 8 | Spring 2023] ] --- ## Announcements * Project 1 questions? * P-Set 4 is due on Wed at 10pm. ************************ ## Week's Goals .pull-left[ **Mon Lecture** * Control Flow ] .pull-right[ **Wed Lecture** * Functions ] --- ### Project 1 Check-In * If you haven't already, make sure to read over the "Tips for getting started" section of the Project 1 instructions. * Everyone should have access to their project group repo. * Make sure to come by office hours and/or section with questions or to talk out your plan for your dashboard! #### Timeline * 3/30 (noon): Post a working draft of your dashboard to [https://www.shinyapps.io/](shinyapps.io/) + Only one group member needs to post the group's app to shinyapps.io. * 3/30 (noon): Post the link to the group's dashboard on the shared spreadsheet * 3/30 - 4/1: Peer/TF feedback period + Section time that week will be devoted to feedback activities. * 4/5 (10pm): Link for the final version of dashboard should be posted to the shared spreadsheet and a PDF of your data scientist's statement should be submitted on Gradescope. --- ### Projects and Git/GitHub * Github Repo = `RStudio` Project * This means you need to create a new `RStudio` project that is synced with your group's GitHub repo that I created. + Instructions from [Week 3 Wed Slides](https://mcconvil.github.io/stat108s23/stat108_wk03wed.html#47) * Don't need to create a new PAT! --- ### Git Branches * Branch = Detour from main stream of development. * Workflow: + Create a new branch. + Checkout (switch) to that branch. + Commit the work for that branch. + Merge it into the main branch. + Can also be done on GitHub via a Pull Request. * If you have Git experience or want to try out branches, check out [Ch 22 in Happy Git with R](https://happygitwithr.com/git-branches.html#git-branches). * For novices, I recommend staying on the main branch. --- ### Workflow Once your GitHub repo and RStudio project are synced, here's your workflow: * **Pull** the most recent version of the repo from GitHub to your RStudio project. * Do some work on your project in RStudio. * **Commit** that work. + Committing takes a snapshot of all the files in the project. + Look over the **Diff**: which shows what has changed since your last update. + Include a quick note, **Commit Message** to summarize the motivation for the changes. * **Push** your commit to GitHub from RStudio. --- ### Git Collaboration: Merge conflicts * What if my collaborators and I both make changes? + Scenario: Your collaborator makes changes to a file, commits, and pushes to GitHub. You also modify that file, commit and push. + Result: Your push will fail because there's a commit on GitHub that you don't have. + Usual Solution: Pull and *usually* git will merge their work nicely with yours. Then push. If that doesn't work, you have a **merge conflict**. Let's cross that bridge when we get there. * How to avoid merge conflicts? + Always pull when you are going to work on your project. + Always commit and push when you are done even if you made small changes. --- ## Collaboration: Git Style * **Projects**: Can use to create to do lists and stay organized. * **Issues**: Useful method to communicate with your group members. --- class: middle, center ### Next couple of weeks, we will be [focusing on](https://docs.google.com/spreadsheets/d/1Ejyq-jcg7aCEqI3W4ahx4LrISTNewV2s6u-xozOx6Zg/edit?usp=sharing): <img src="img/STAT108Logo_Programming.png" width="20%" style="display: block; margin: auto;" /> --- ### Course Goal: Learn to use several `R` packages. .pull-left[ <img src="img/box.png" width="45%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/packages_wall.001.jpeg" width="85%" style="display: block; margin: auto;" /> ] --- ### Course Goal: Learn to create our own functions in `R`. .pull-left[ <img src="img/homemade.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ```r magic_eight_ball <- function( question = "Will it rain today?"){ answers <- c("Without a doubt", "Concentrate and ask again", "My sources say no") return(sample(x = answers, size = 1)) } magic_eight_ball(question = "Should I take Stat 108?") ``` ``` ## [1] "My sources say no" ``` ] In about a month, you will come full circle and bundle up your own functions into their own `R` package in Project 2! --- ### Control Flow * The flow of execution of `R` code. * So far, our primary rule: Place your code in the order that you want `R` to evaluate it. .pull-left[ ```r crash_data <- read_csv("https://raw.githubusercontent.com/harvard-stat108s23/materials/main/psets/data/cambridge_cyclist_ped_crash.csv") ``` ``` ## Error in read_csv("https://raw.githubusercontent.com/harvard-stat108s23/materials/main/psets/data/cambridge_cyclist_ped_crash.csv"): could not find function "read_csv" ``` ```r library(tidyverse) ``` ] .pull-right[ ```r library(tidyverse) crash_data <- read_csv("https://raw.githubusercontent.com/harvard-stat108s23/materials/main/psets/data/cambridge_cyclist_ped_crash.csv") ``` ] * Side note: Order didn't matter in the `shiny` `server()` function because it uses **declarative** programming instead of **imperative** programming. --- ### Control Flow * Can also get `R` to conditionally run code blocks or run code blocks multiple times. * Control structures allow you to control the flow of execution. * Main tools we will use to control the flow: + `if()` and `else()` + `stop()` and `stopifnot()` + `for()` loops .pull-left[ ```r if(magic_eight_ball("Should I take Stat 108?") == "Without a doubt") { print("Stay seated") } else { print("Leave and buy a coffee.") } ``` ``` ## [1] "Stay seated" ``` ] .pull-right[ ```r for(i in 1:20){ Iterate_over_something_awesome } ``` ] --- ### Come Back to Logical Operators * To control the flow, we need to be really comfortable with logical operators. Or operators: .pull-left[ ```r a <- c(TRUE, FALSE, FALSE, TRUE) b <- c(TRUE, FALSE) a | b ``` ``` ## [1] TRUE FALSE TRUE TRUE ``` ```r a || b ``` ``` ## Warning in a || b: 'length(x) = 4 > 1' in coercion to 'logical(1)' ``` ``` ## [1] TRUE ``` ```r xor(a, b) ``` ``` ## [1] FALSE FALSE TRUE TRUE ``` ] .pull-right[ * Which is [vectorized](https://mcconvil.github.io/stat108s23/stat108_wk04mon.html#44)? + When not, what is happening? * What is recycling again? ] --- ### Logical Operators And operators: .pull-left[ ```r a <- c(TRUE, FALSE, FALSE, TRUE) b <- c(TRUE, FALSE) a & b ``` ``` ## [1] TRUE FALSE FALSE FALSE ``` ```r a && b ``` ``` ## Warning in a && b: 'length(x) = 4 > 1' in coercion to 'logical(1)' ``` ``` ## Warning in a && b: 'length(x) = 2 > 1' in coercion to 'logical(1)' ``` ``` ## [1] TRUE ``` ] .pull-right[ * Which is vectorized? + When not, what is happening? * What is recycling again? ] --- ### Logical Operators Not operator: ```r a <- c(TRUE, FALSE, FALSE, TRUE) b <- c(TRUE, FALSE) !a ``` ``` ## [1] FALSE TRUE TRUE FALSE ``` ```r !(a & b) ``` ``` ## [1] FALSE TRUE TRUE TRUE ``` --- ### Comparison Operators .pull-left[ ```r x <- c(1, 3, 5) y <- c(1, 4, 2) z <- c(3, 2) x < y ``` ``` ## [1] FALSE TRUE FALSE ``` ```r x <= y ``` ``` ## [1] TRUE TRUE FALSE ``` ```r x > y ``` ``` ## [1] FALSE FALSE TRUE ``` ```r x >= y ``` ``` ## [1] TRUE FALSE TRUE ``` ] .pull-right[ ```r x != y ``` ``` ## [1] FALSE TRUE TRUE ``` ```r x == y ``` ``` ## [1] TRUE FALSE FALSE ``` ```r z %in% x ``` ``` ## [1] TRUE FALSE ``` ```r x %in% z ``` ``` ## [1] FALSE TRUE FALSE ``` * Vectorized? ] --- ### Conditional Control Flow `R` will conditionally execute code via `if` statements. The basic one-line form is: ```r if (condition) true_action if (condition) true_action else false_action ``` The basic multi-line form is: ```r if (condition) { true_action } if (condition) { true_action } else { false_action } ``` --- ### Conditional Control Flow `R` will conditionally execute code via `if` statements. .pull-left[ ```r x <- c(1, 3, 5) if (3 %in% x){ print("Vector contains 3.") } ``` ``` ## [1] "Vector contains 3." ``` ```r if (4 %in% x){ print("Vector contains 4.") } ``` ] .pull-right[ ```r if (4 %in% x){ print("Vector contains 4.") } else { print("Vector does not contain 4.") } ``` ``` ## [1] "Vector does not contain 4." ``` ] --- ### Conditional Control Flow `if()` is not vectorized! ```r x <- c(1, 3, 5) if (x == 1){ print("Vector contains 1.") } ``` ``` ## Error in if (x == 1) {: the condition has length > 1 ``` ```r if (x == 3){ print("Vector contains 3.") } ``` ``` ## Error in if (x == 3) {: the condition has length > 1 ``` --- ### Conditional Control Flow `if()` is not vectorized! ```r if (x %in% 3){ print("Vector contains 3.") } ``` ``` ## Error in if (x %in% 3) {: the condition has length > 1 ``` ```r if (3 %in% x){ print("Vector contains 3.") } ``` ``` ## [1] "Vector contains 3." ``` --- ### Converting A Warning to an Error With R 3.5.0(+), can turn the length warning into an error: ```r Sys.setenv("_R_CHECK_LENGTH_1_CONDITION_" = "true") x <- c(1, 3, 5) if (x == 3){ print("Vector contains 3.") } ``` ``` ## Error in if (x == 3) {: the condition has length > 1 ``` --- ### Collapsing Logical Vectors Because `if` is not vectorized, it is often helpful to collapse logical vectors to a single value. .pull-left[ ```r x <- c(1, 3, 5) any(x == 1) ``` ``` ## [1] TRUE ``` ```r if (any(x == 1)){ print("Vector contains 1.") } ``` ``` ## [1] "Vector contains 1." ``` ```r if (any(x == 3)){ print("Vector contains 3.") } ``` ``` ## [1] "Vector contains 3." ``` ] .pull-right[ ```r all(x <= 4) ``` ``` ## [1] FALSE ``` ```r all(x <= 5) ``` ``` ## [1] TRUE ``` ] --- ### Continuing the Conditional Executions .pull-left[ ```r x = 3 if (x < 0 ) { print("x is negative") } else if (x > 0) { print("x is positive") } else { print("x is zero") } ``` ``` ## [1] "x is positive" ``` ] .pull-right[ ```r x = 0 if (x < 0 ) { print("x is negative") } else if (x > 0) { print("x is positive") } else { print("x is zero") } ``` ``` ## [1] "x is zero" ``` ] --- ### Saving Results .pull-left[ ```r x <- 82 if (x >= 90) { "A" } else if (x >= 80) { "B" } else if (x >= 70) { "C" } else if (x >= 60) { "D" } else { "F" } ``` ``` ## [1] "B" ``` ] .pull-right[ ```r if (x >= 90) { grade = "A" } else if (x >= 80) { grade = "B" } else if (x >= 70) { grade = "C" } else if (x >= 60) { grade = "D" } else { grade = "F" } grade ``` ``` ## [1] "B" ``` ] --- ### Vectorized `if () else()` .pull-left[ * `ifelse()` runs operations on an entire vector. ```r x <- c(82, 74, 95) grade <- ifelse(x >= 90, "A", "not A") grade ``` ``` ## [1] "not A" "not A" "A" ``` ] .pull-right[ * `case_when()` provides a more flexible alternative. ```r grade <- case_when( x >= 90 ~ "A", x >= 80 ~ "B", x >= 70 ~ "C", x >= 60 ~ "D", TRUE ~ "F" ) grade ``` ``` ## [1] "B" "C" "A" ``` ] --- ### Error Checking * Often want to validate user inputs or function assumptions. + Will see this in action next time when discussing functions. * If an input is incorrect or an assumption isn't met, then want to report the error and **stop the code**. ```r x <- c(82, 74, 95) stopifnot(is.numeric(x)) stopifnot(is.character(x)) ``` ``` ## Error: is.character(x) is not TRUE ``` * Want a more helpful error message! --- ### Error Checking * Check inputs with `if()` * And then `stop()` if they are wrong. ```r x <- c(82, 74, 95) if(!is.character(x)) { stop('This function only works for character inputs.\n', 'You have provided an object of class: ', class(x)) } ``` ``` ## Error in eval(expr, envir, enclos): This function only works for character inputs. ## You have provided an object of class: numeric ``` --- ### Missing Values * Missing values are important in statistics. + A missing value doesn't mean 0/ * As we've seen, `NA` = missing value in `R`. * But the class of the `NA` depends on the atomic vector. ```r class(NA) ``` ``` ## [1] "logical" ``` ```r class(NA + 1) ``` ``` ## [1] "numeric" ``` ```r class(c(NA, "cat")) ``` ``` ## [1] "character" ``` --- ### Missing Values We need to make sure we control how `NA` are handled because they impact calculations. ```r x <- c(82, 74, 95, NA) x + 5 ``` ``` ## [1] 87 79 100 NA ``` ```r mean(x) ``` ``` ## [1] NA ``` ```r mean(x, na.rm = TRUE) ``` ``` ## [1] 83.66667 ``` --- ### Missing Values and Logicals `NA`s and logicals can exhibit interesting behavior. ```r TRUE & NA ``` ``` ## [1] NA ``` ```r FALSE & NA ``` ``` ## [1] FALSE ``` ```r TRUE | NA ``` ``` ## [1] TRUE ``` ```r FALSE | NA ``` ``` ## [1] NA ``` --- ### Conditionals and Missing Values Need to handle `NA`s carefully when writing conditionals. ```r y <- 80 if(y != NA) { print("Took the exam.") } ``` ``` ## Error in if (y != NA) {: missing value where TRUE/FALSE needed ``` ```r x <- c(82, 74, 95, NA) if (all(x >= 60)){ print("Everyone passed! :)") } ``` ``` ## Error in if (all(x >= 60)) {: missing value where TRUE/FALSE needed ``` ```r if (any(x >= 60)){ print("At least one student passed.") } ``` ``` ## [1] "At least one student passed." ``` --- ### Test for Missing Values * After testing for `NA`s, how you handle vectors with or without `NA`s might differ. ```r is.na(NA) ``` ``` ## [1] TRUE ``` ```r y <- 80 if(is.na(y)) { print("Took the exam.") } any(is.na(x)) ``` ``` ## [1] TRUE ``` ```r all(is.na(x)) ``` ``` ## [1] FALSE ``` --- ### Control Flow Recap * The flow of execution of `R` code. * Order still matters. * But now we can: + Run subsets of the code with `if()` and `else()`. + Check for errors with `stop()`. * Will add in `for()` loops next week. * Also need to weave in best practices. + Lots of ways to do something. Which is best?