background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[Code Smells & Refactoring] <br> #### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 9 | Spring 2023] ] --- ## Announcements * Project 1 * P-Set 5 is due at 10pm on Wed. * P-Set 6 released tomorrow morning but not due until April 12th. * No lecture quiz. Focus on the Project 1 peer feedback! ************************ ## Week's Goals .pull-left[ **Mon Lecture** * Iteration via `for()` and `while()` loops * Functional Programming via `purrr` ] .pull-right[ **Wed Lecture** * Finish up Monday's slides * Code smells * Refactoring code ] --- ### Project 1 Check-In #### Timeline * 3/30 (noon): Post a working draft of your dashboard to [https://www.shinyapps.io/](shinyapps.io/) + Only one group member needs to post the group's app to shinyapps.io. * 3/30 (noon): Post the link to the group's dashboard on the shared spreadsheet * 3/30 - 4/1: Peer/TF feedback period + Section time that week will be devoted to feedback activities. * 4/5 (10pm): Link for the final version of dashboard should be posted to the shared spreadsheet and a PDF of your data scientist's statement should be submitted on Gradescope. #### Project 1 Peer Feedback Activity * Here are the instructions for the [feedback activity](https://docs.google.com/document/d/1LFBp51p0UVT0ei-1iFPz5ItxADjpbpEU6rHAkCq1AFU/edit?usp=sharing). + I will also send out a link on Slack to these instructions at noon tomorrow. --- ### Communicating through Code .pull-left[ * Today we are focusing on **writing better code**. + This is a important part of sharing our code/communicating the goals of our data work through our code. ] .pull-right[ <img src="img/STAT108Logo_Sharing.png" width="50%" style="display: block; margin: auto;" /> ] --- ### Code Smells and Refactoring * **Code Smells**: Structures in code that suggest **refactoring** is needed. -- * Refactor: Make code + Easier to understand + Easier to contribute to + But without changing the observable behavior -- * Not talking about `factor()`s! --- ### R Coding Hard Rules Do you have any hard rules? -- * Don't use + `attach()`. + `setwd()`. + A workflow that requires `rm(list=ls())` * Rarely write code directly in the console. (Almost) always write it in an R script or Rmd. * Don’t Repeat Yourself (DRY). * Use consistent styling. --- ### Refactoring * Loads of R coding *soft rules*. * Example: Comment your code. + But how? How much? * My opinionated commenting rules: + Consider your audience. What does the reader of your code need to know? + Try to write readable code (i.e., good names, helper functions, spaces, ...) first. + Then add comments to syntax-heavy code or as an alert. + Be succinct in the comments you do write. + Use comments to help break up large blocks of code into its main purpose. ```r # ---------------- Load Data # ---------------- Wrangle Data # ---------------- Create Visuals ``` --- ### Refactoring Today we are going to: * Detect a code smell. + A signal that more elegant code is needed. -- * Apply a particular refactoring. -- * Focus on code **revising**. + Even for experienced users, their first draft of new code will have smells. + To refactor, we must first work to understand what the code does. This will help us see what assumptions we've made along the way. --- * What code smells exist in the example from the last week's Section activity? * How should we refactor? ```r thing <- function(na.rm = FALSE, x = c(7, 1), y = c(3, 4)) { var1 <- mean(x, na.rm = TRUE) var2 <- mean(y, na.rm = TRUE) xx <- var(x, na.rm = TRUE) yy <- var(y, na.rm = TRUE) na <- length(x) nb <- length(y) df <- min(na, nb) - 1 important_bit <- (var1 - var2 - 0)/sqrt(xx/na + yy/nb) pt(q = abs(important_bit), df = df, lower.tail = FALSE)*2 } ``` --- * After refactoring, make sure to still test it! ```r two_sample_t_test <- function(x, y, na.rm = FALSE, null = 0, one_sided = FALSE) { stopifnot(is.numeric(x)) stopifnot(is.numeric(y)) mean_x <- mean(x, na.rm = na.rm) mean_y <- mean(y, na.rm = na.rm) var_x <- var(x, na.rm = na.rm) var_y <- var(y, na.rm = na.rm) n_x <- sum(!is.na(x)) n_y <- sum(!is.na(y)) df <- min(n_x, n_y) - 1 test_stat <- (mean_x - mean_y - null)/sqrt(var_x/n_x + var_y/n_y) p_value <- pt(q = abs(test_stat), df = df, lower.tail = FALSE) return(data.frame(test_stat = test_stat, df = df, p_value = ifelse(one_sided, p_value, p_value * 2))) } ``` --- ### Jenny Bryan's Bizarro Function ```r x <- 1:5 #x <- c(TRUE, FALSE, FALSE, TRUE, FALSE) cat( "The bizarro version of x is", -x, #!x, "\n" ) ``` ``` ## The bizarro version of x is -1 -2 -3 -4 -5 ``` --- ### Jenny's Bizarro Function ```r #x <- 1:5 x <- c(TRUE, FALSE, FALSE, TRUE, FALSE) cat( "The bizarro version of x is", #-x, !x, "\n" ) ``` ``` ## The bizarro version of x is FALSE TRUE TRUE FALSE TRUE ``` * What is the code smell? --- ### Commenting Changes Behavior * Guideline: Do not comment and uncomment sections of code to alter the behavior. ```r #x <- 1:5 x <- c(TRUE, FALSE, FALSE, TRUE, FALSE) cat( "The bizarro version of x is", #-x, !x, "\n" ) ``` ``` ## The bizarro version of x is FALSE TRUE TRUE FALSE TRUE ``` --- ### Commenting Changes Behavior * Fix with `if () else ()` ```r x <- 1:5 #x <- c(TRUE, FALSE, FALSE, TRUE, FALSE) cat( "The bizarro version of x is", if (is.numeric(x)) { -x } else { !x }, "\n" ) ``` ``` ## The bizarro version of x is -1 -2 -3 -4 -5 ``` --- ### Commenting Changes Behavior * Fix with `if () else ()` but... + Use `if () else ()` in moderation ```r #x <- 1:5 x <- c(TRUE, FALSE, FALSE, TRUE, FALSE) cat( "The bizarro version of x is", if (is.numeric(x)) { -x } else { !x }, "\n" ) ``` ``` ## The bizarro version of x is FALSE TRUE TRUE FALSE TRUE ``` * Now that we have a working example, how can we fix the first comment/uncomment code smell? --- ### Guideline: Use Functions! ```r bizarro <- function(x) { if (is.numeric(x)) { -x } else { !x } } # Test it! bizarro(x = 1:5) ``` ``` ## [1] -1 -2 -3 -4 -5 ``` ```r bizarro(x = c(TRUE, FALSE, FALSE, TRUE, FALSE)) ``` ``` ## [1] FALSE TRUE TRUE FALSE TRUE ``` --- * Code smell? ```r bizarro <- function(x) { if (class(x)[[1]] == "numeric" || class(x)[[1]] == "integer") { -x } else if (class(x)[[1]] == "logical") { !x } else { stop( "Don't know how to make bizzaro <", class(x)[[1]], ">") } } bizarro(c(TRUE, FALSE, FALSE, TRUE, FALSE)) ``` ``` ## [1] FALSE TRUE TRUE FALSE TRUE ``` ```r bizarro(1:5) ``` ``` ## [1] -1 -2 -3 -4 -5 ``` ```r bizarro(c("abc", "def")) ``` ``` ## Error in bizarro(c("abc", "def")): Don't know how to make bizzaro <character> ``` --- ## Guidelines: * Use proper functions for handling class * Use simple conditions. * Use well-written existing functions. ```r bizarro <- function(x) { if (is.numeric(x)) { -x } else if (is.logical(x)) { !x } else { stop( "Don't know how to make bizzaro <", class(x)[[1]], ">") } } bizarro(c(TRUE, FALSE, FALSE, TRUE, FALSE)) ``` ``` ## [1] FALSE TRUE TRUE FALSE TRUE ``` ```r bizarro(1:5) ``` ``` ## [1] -1 -2 -3 -4 -5 ``` --- ### Guideline: Stop Early ```r bizarro <- function(x) { stopifnot(is.numeric(x) || is.logical(x)) if (is.numeric(x)) { -x } else { !x } } bizarro(c(TRUE, FALSE, FALSE, TRUE, FALSE)) ``` ``` ## [1] FALSE TRUE TRUE FALSE TRUE ``` ```r bizarro(1:5) ``` ``` ## [1] -1 -2 -3 -4 -5 ``` ```r bizarro(c("abc", "def")) ``` ``` ## Error in bizarro(c("abc", "def")): is.numeric(x) || is.logical(x) is not TRUE ``` --- ### Create some demo data ```r library(pdxTrees) library(tidyverse) pdxTrees_demo <- get_pdxTrees_parks() %>% select(Species, Tree_Height) %>% head(n = 10) pdxTrees_demo ``` ``` ## # A tibble: 10 × 2 ## Species Tree_Height ## <chr> <dbl> ## 1 PSME 105 ## 2 PSME 94 ## 3 CRLA 23 ## 4 QURU 28 ## 5 PSME 102 ## 6 PSME 95 ## 7 PSME 103 ## 8 PSME 105 ## 9 PSME 97 ## 10 PSME 112 ``` --- * Code smell? ```r pdxTrees_demo <- pdxTrees_demo %>% mutate(ht_cat = if_else(Tree_Height < 30, "short", if_else(Tree_Height < 100, "medium", if_else(Tree_Height < 110, "tall", "very tall")))) pdxTrees_demo ``` ``` ## # A tibble: 10 × 3 ## Species Tree_Height ht_cat ## <chr> <dbl> <chr> ## 1 PSME 105 tall ## 2 PSME 94 medium ## 3 CRLA 23 short ## 4 QURU 28 short ## 5 PSME 102 tall ## 6 PSME 95 medium ## 7 PSME 103 tall ## 8 PSME 105 tall ## 9 PSME 97 medium ## 10 PSME 112 very tall ``` --- ### Oniony Code ```r pdxTrees_demo <- pdxTrees_demo %>% mutate(ht_cat = if_else(Tree_Height < 30, "short", if_else(Tree_Height < 100, "medium", if_else(Tree_Height < 110, "tall", "very tall")))) pdxTrees_demo ``` ``` ## # A tibble: 10 × 3 ## Species Tree_Height ht_cat ## <chr> <dbl> <chr> ## 1 PSME 105 tall ## 2 PSME 94 medium ## 3 CRLA 23 short ## 4 QURU 28 short ## 5 PSME 102 tall ## 6 PSME 95 medium ## 7 PSME 103 tall ## 8 PSME 105 tall ## 9 PSME 97 medium ## 10 PSME 112 very tall ``` --- ### Guideline: Less indents are easier to read ```r pdxTrees_demo <- pdxTrees_demo %>% mutate(ht_cat = case_when( Tree_Height < 30 ~ "short", Tree_Height < 100 ~ "medium", Tree_Height < 110 ~ "tall", TRUE ~ "very tall" )) pdxTrees_demo ``` ``` ## # A tibble: 10 × 3 ## Species Tree_Height ht_cat ## <chr> <dbl> <chr> ## 1 PSME 105 tall ## 2 PSME 94 medium ## 3 CRLA 23 short ## 4 QURU 28 short ## 5 PSME 102 tall ## 6 PSME 95 medium ## 7 PSME 103 tall ## 8 PSME 105 tall ## 9 PSME 97 medium ## 10 PSME 112 very tall ``` --- * Code Smell? ```r avg <- 5 std_dev <- 2 z_score <- function(x){ (x - avg)/std_dev } z_score(2) ``` ``` ## [1] -1.5 ``` --- ### Guideline: Beware of Global Data ```r z_score <- function(x, avg, std_dev){ (x - avg)/std_dev } z_score(x = 2, avg = 5, std_dev = 2) ``` ``` ## [1] -1.5 ``` --- * Code Smells? ```r two_sample_t_test <- function(x, y, na.rm = FALSE, null = 0) { mean_x <- sum(x)/length(x) mean_y <- sum(y)/length(y) var_x <- sum((x - sum(x)/length(x))^2/(length(x)-1)) var_y <- sum((y - sum(y)/length(y))^2/(length(y)-1)) n_x <- length(x) n_y <- length(y) df <- min(n_x, n_y) - 1 test_stat <- (mean_x - mean_y - null)/sqrt(var_x/n_x + var_y/n_y) p_value <- pt(q = abs(test_stat), df = df, lower.tail = FALSE)*2 return(data.frame(test_stat = test_stat, p_value = p_value)) } # Test it x <- rnorm(n = 10) y <- rnorm(n = 20) two_sample_t_test(na.rm = TRUE, x = x, y = y) ``` ``` ## test_stat p_value ## 1 0.4925738 0.6341068 ``` --- ### Guideline: Aggressively Decomposing Work in Little Functions * Don't write monster functions! * Small well-named helper functions are better than one BIG function with loads of commented code. ```r test_stat_t <- function(x, y, na.rm = FALSE, null = null){ mean_x <- mean(x, na.rm = na.rm) mean_y <- mean(y, na.rm = na.rm) var_x <- var(x, na.rm = na.rm) var_y <- var(y, na.rm = na.rm) n_x <- sum(!is.na(x)) n_y <- sum(!is.na(y)) (mean_x - mean_y - null)/sqrt(var_x/n_x + var_y/n_y) } df_t <- function(n_x, n_y) min(n_x, n_y) - 1 p_value_t <- function(test_stat, df){ pt(q = abs(test_stat), df = df, lower.tail = FALSE)*2 } ``` --- ### Guideline: Aggressively Decompose Code into Little Functions * If the helper functions are sufficiently well-named, then you likely don't need to look at the body of these functions to understand the code. ```r two_sample_t_test <- function(x, y, na.rm = FALSE, null = 0) { test_stat <- test_stat_t(x = x, y = y, na.rm = na.rm, null = null) df <- df_t(sum(!is.na(x)), sum(!is.na(y))) p_value <- p_value_t(test_stat, df) return(data.frame(test_stat = test_stat, df = df, p_value = p_value)) } # Test it two_sample_t_test(na.rm = TRUE, x = x, y = y) ``` ``` ## test_stat df p_value ## 1 0.4925738 9 0.6341068 ``` --- ### Guideline: Vectorize code whenever possible ```r system.time({ x <- rnorm(n = 100000, mean = 0, sd = 1) y <- rnorm(n = 100000, mean = 2, sd = 2) z <- x + y }) ``` ``` ## user system elapsed ## 0.010 0.003 0.014 ``` ```r system.time({ z <- rep(NA, 100000) for(i in seq_along(z)){ z[i] <- rnorm(n = 1, mean = 0, sd = 1) + rnorm(n = 1, mean = 2, sd = 2) } }) ``` ``` ## user system elapsed ## 0.429 0.017 0.446 ``` --- ### Refactoring Recap * First write code that works. * Then refactor it to be more + Readable + Efficient + Defensive + General * But make sure the code still works after the refactoring! --- ### Closing Thought * Jenny Bryan inspired much of the discussion today. Reading her materials and watching her presentations has made me a better coder. * Close with a thought from her: <img src="img/cakes.png" width="65%" style="display: block; margin: auto;" />