background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[R Packages] <br> #### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 10 | Spring 2023] ] --- ## Announcements * Fill out [this form](https://forms.gle/sY39jB8WLSdFYwfMA) to help us create the Project 2 groups. * Project 1 is due at 10pm on Wednesday. + Come by office hours with questions. ************************ ## Week's Goals .pull-left[ **Mon Lecture** * Why `R` packages? * `R` package creation demo ] .pull-right[ **Wed Lecture** * Documentation * Package metadata ] --- ### Project 1 #### Suggestions * Make sure to consider the peer feedback and to go through the Rubric/Requirements. * Going from MVP to final product should take **several** iterations. * Don't just focus on functionality. Also, consider lay-out and aesthetics. * Graphs should be high-quality, include appropriate context, and should follow Best Practices. * Don't forget about the Data Scientist's Statement. #### Feedback after looking over the dashboards: * Use `theme()` or `update_theme()` to increase the font sizes. * Don't worry if your app takes a while to load. (If we were using the non-free version of Shinyapps, it would be faster.) * Include more text to orient the user but be concise. + Use **bolding/color** and headers to help the user pull out the most relevant text. --- ### Refactoring Recap * First write code that works. * Then refactor it to be more + Readable + Efficient + Defensive + General * But make sure the code still works after the refactoring! --- ### One More Refactoring Thought * Jenny Bryan inspired much of the discussion from last Wed's lecture. Reading her materials and watching her presentations has made me a better coder. * Close with a thought from her: <img src="img/cakes.png" width="65%" style="display: block; margin: auto;" /> --- class: middle, center ## Sharing Code <img src="img/STAT108Logo_Sharing.png" width="30%" style="display: block; margin: auto;" /> #### How do you share code? --- ### Sharing Code #### Options * `R` scripts and `R` Markdown documents * RStudio Projects * GitHub Repository * R Package #### How to pick between the options? --- ### When To Create an `R` Package * If you have certain recurring operations that occur across multiple projects. * Want to share data and documentation and not worry about file paths, file types, the documentation getting lost... * **BUT**, you will still use an RStudio Project (with a corresponding GitHub repo) for the code and data that are specific to a given project. --- ### R Packages * What is an R package? -- > "R packages are the fundamental unit of R-ness". -- Jenny Bryan -- * Contains functions, (possibly) datasets, documentation, and tests. * "base R": 14 base packages that are preloaded + There are 15 other packages that also come preloaded. * CRAN has > 19,000 more packages + `install.packages("dplyr")` + `library(dplyr)` * And then there are all the packages on `GitHub`: + `devtools::install_github("hadley/dplyr")` + `library(dplyr)` --- ### R Data Packages * Great way to share data! * Example 1: + `library(mosaicData)` + `data(package = "mosaicData")` + `?Births2015` -- * Example 2: + `library(pdxTrees)` + `get_pdxTrees_parks()` + `get_pdxTrees_streets()` -- * Example 3: + `library(gbfs)` --- ### R Packages #### Why create an R package? * Very portable. * Includes documentation. * Provides a useful structure for organizing your work: + `R` folder: Code + `tests` folder: Testing functions + `data` folder: Storing data * Lots of helper functions in other packages to automate parts of the process. --- ### Creating an R Package Key packages: * [`devtools`](https://cran.r-project.org/web/packages/devtools/index.html): supports the development and dissemination of the package * [`usethis`](https://usethis.r-lib.org/): automates steps of package creation, such as constructing the data file * [`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html): simplifies writing documentation --- class: middle, center ## Let's Build an `R` Package! --- #### Our Package: `camDogs` **Goals**: * Share the [Dogs of Cambridge dataset](https://data.cambridgema.gov/General-Government/Dogs-of-Cambridge/sckh-3xyx), along with documentation. * Create a function that outputs a dataset of the dogs with the 10 most common names. * Useful MVP code: ```r library(tidyverse) camDogs <- read_csv("https://data.cambridgema.gov/api/views/sckh-3xyx/rows.csv?accessType=DOWNLOAD") camDogs <- mutate(camDogs, Breed = if_else(Dog_Breed == "Mixed Breed", "Mixed", "Single")) top10 <- function(data, x){ # Find the 10 top based on x top10x <- count(data, {{x}}) %>% slice_max(n = 10, order_by = n) %>% select({{x}}) %>% pull() # Filter dataset to only the 10 top based on x return(filter(data, {{x}} %in% top10x)) } ``` --- ### Steps * Let's go through the "Creating an R Package" hand-out. * I will demo the process with [this Dogs of Cambridge dataset](https://data.cambridgema.gov/General-Government/Dogs-of-Cambridge/sckh-3xyx). * If following along, clone this repo: [https://github.com/harvard-stat108s23/pkgDemo](https://github.com/harvard-stat108s23/pkgDemo) --- ### Wrap-Up Thought #### Most Common Confusion When Moving from Scripts/R Markdowns to Package Writing * Requires new ways of working with functions in other packages. * `DESCRIPTION` file to declare dependencies. * Use `package_name::function_name()`. * Can't use `library(package_name)`. * Want to avoid unnecessary dependencies. #### Other questions?