background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[R Packages] ## .base_color[Documentation] <br> #### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 10 | Spring 2023] ] --- ## Announcements * Project 1 is due today at 10pm. * Fill out the following two forms: + By April 9th: [Project 2 Group Matching Form](https://forms.gle/sY39jB8WLSdFYwfMA) + By April 14th: [Project 1 Group Work Form](https://forms.gle/C2iFjTo3Ww1CMsFy5) ************************ ## Week's Goals .pull-left[ **Mon Lecture** * Why `R` packages? * `R` package creation demo ] .pull-right[ **Wed Lecture** * Documentation * Package metadata ] --- ### R Packages #### Recap * Provides a useful structure for organizing your work: + `R` folder: Code + `tests` folder: Testing functions + `data-raw` folder: Wrangling data + `data` folder: Storing data * Lots of helper functions in other packages to automate parts of the process. --- #### Our Package: `camDogs` * Share the [Dogs of Cambridge dataset](https://data.cambridgema.gov/General-Government/Dogs-of-Cambridge/sckh-3xyx), along with documentation. * Includes, `top10()`, a function that outputs a dataset of the dogs with the 10 most common names. * Working copy can be found at [https://github.com/harvard-stat108s23/camDogs](https://github.com/harvard-stat108s23/camDogs) #### Most Common Confusion When Moving from Scripts/R Markdowns to Package Writing * Requires new ways of working with functions in other packages. * `DESCRIPTION` file to declare dependencies. * Use `package_name::function_name()`. * Can't use `library(package_name)`. * Want to avoid unnecessary dependencies. #### Other questions so far? --- class: middle, center ## Let's go through some of the important components of the package more slowly now. --- ### Package States #### Five states: * Source + Directory of files with a specific structure + State of [`CamDogs` on GitHub](https://github.com/harvard-stat108s23/camDogs) * Bundled + Compressed into a single file "source tarball" (`.tar.gz`) + Available for CRAN packages: [https://cran.r-project.org/web/packages/forcats/index.html](https://cran.r-project.org/web/packages/forcats/index.html) * Binary + For distribution + Handled by CRAN + Are platform specific * Installed * `install.packages()` * `devtools::install_github()` * In-memory * `library()` --- ### Ignoring Files in the Build * `.Rbuildignore` + To add to the file: `usethis::use_build_ignore()` * What to add to `.Rbuildignore` + Files that help generate contents programmatically + Ex: `DATASET.R` + Files that help with the package development that aren't standard (by CRAN terms) + Ex: `pkgdown` files --- ### The Package Name #### Hard Rules: 1. The name can only consist of letters, numbers, and periods, i.e., .. + No `_` allowed! + But don't use periods. 2. It must start with a letter. 3. It cannot end with a period. --- ### The Package Name #### Other Guidelines * Pick a unique name. * Check if it is already in use. ```r available::available("camDogs") ``` ``` ## ── camDogs ───────────────────────────────────────────────────────────────────── ## Name valid: ✔ ## Available on CRAN: ✔ ## Available on Bioconductor: ✔ ## Available on GitHub: ✖ ## Abbreviations: http://www.abbreviations.com/cam ## Wikipedia: https://en.wikipedia.org/wiki/cam ## Wiktionary: https://en.wiktionary.org/wiki/cam ## Sentiment:??? ## Abbreviations: http://www.abbreviations.com/Dogs ## Wikipedia: https://en.wikipedia.org/wiki/Dogs ## Wiktionary: https://en.wiktionary.org/wiki/Dogs ## Sentiment:??? ## Abbreviations: http://www.abbreviations.com/camD ## Wikipedia: https://en.wikipedia.org/wiki/camD ## Wiktionary: https://en.wiktionary.org/wiki/camD ## Sentiment:??? ## Abbreviations: http://www.abbreviations.com/ogs ## Wikipedia: https://en.wikipedia.org/wiki/ogs ## Wiktionary: https://en.wiktionary.org/wiki/ogs ## Sentiment:??? ## Abbreviations: http://www.abbreviations.com/camDogs ## Wikipedia: https://en.wikipedia.org/wiki/camDogs ## Wiktionary: https://en.wiktionary.org/wiki/camDogs ## Sentiment:??? ``` --- ### The Package Name #### Other Guidelines * Ask for suggestions (but maybe not from `available`): ```r available::suggest("Dogs in Cambridge, MA") ``` ``` ## dogsr ``` ```r available::suggest("Cambridge, MA Dogs") ``` ``` ## cambridger ``` --- ### The Package Name #### Other Guidelines * Avoid using both upper and lower case, extra for readability. + If choosing between `MASE` and `mase`, go lower-case. * Find a name that rolls off the tongue. + `purrr` * Don't include a version number: + `ggplot2` * Capture the goal of the package in the name: + `forcats` is an anagram of factors, which we use for categorical data. + `lubridate` makes dates and times easier. * Maybe tack on an `r`. + `stringr` * Include a hint if your package provides extensions to an existing package or follows a certain philosophy. + `ggrepel` + `tidytext` Eventually, you will want to re-name your Project 2 `R` package. Check out Nick Tierney's [blog post](https://www.njtierney.com/post/2017/10/27/change-pkg-name/) on finding all the places in the package where the name will need to be updated. --- ### DESCRIPTION * Provides overall metadata about your package. * If a folder has a `DESCRIPTION` file, then RStudio assumes it is a package (and so gives you a `Build` pane). ```r Package: Insert the package name Title: What the Package Does (One Line, Title Case) Description: One paragraph. If your description spans multiple lines, each line must be no more than 80 characters wide. Indent subsequent lines with 4 spaces. Authors@R: c( person("First", "Last", , "first.last@example.com", role = c("aut", "cre")), person("First", "Last", , "first.last@example.com", role = "aut")) ``` * `cre` = maintainer * `aut` = an author * `ctb` = contributors * `cph` = copyright holders * `fnd` = funder --- ### DESCRIPTION -- `License` * `License` is a required field and must be given in a standard form. * If you don't have a license, no one is allowed to copy your code with your permission. * Make sure your license: + Declares how you want your code to be used. + Respects the license of code and data that your package uses. * General categories: + **Permissive**: code can be freely copied, modified, and published but the license must be preserved. + EX: `usethis::use_mit_license()` + **Copyleft**: code can be freely copied and modified but if then published, it must use the same license as the original code. + EX: `usethis::use_gpl_license()` + **Data**: provide data with minimal restrictions + EX: `usethis::use_cc0_license()` + **Data with attribution**: provide data but require attribution + EX: `usethis::use_ccby_license()` * License does not need to be compatible with the license of R package you import. --- ### DESCRIPTION * `Imports`: List all the packages that your package depends on. * `Suggests`: List all packages that are needed for development tasks or for optional functionality. * `Version`: Communicate where your package is in its lifecycle * `LazyData: true`: Makes data more immediately available. * Provides most of the information displayed on a package's CRAN page: [https://cran.r-project.org/web/packages/forcats/index.html](https://cran.r-project.org/web/packages/forcats/index.html) --- ### NAMESPACE * You won't edit this directly. * Specifies the functions your package makes available to the user. + `export(top10)` --- ### Documentation -- Rd Files * `.Rd` stands for R documentation + Syntax based loosely on LaTeX + Don't edit directly. + Instead add `roxygen2` comments above the code for each function. #### Workflow: * Add `roxygen2` comments to your `.R` scripts. * Run `devtools::document()` to create/update the `.Rd` files. * Preview the document with `?function`. --- ### Documentation -- Rd Files Language: * **Block**: The `roxygen2` comments above a function. * **Tag**: + `@tagName tagValue` * **Introduction**: Text before the first **tag**. * **Description**: Next paragraph ```r #' Create a data frame with just the 10 most common categories #' #' `top10()` filters a data frame to the rows corresponding to the 10 most common values of a variable. #' #' @param data A data frame. #' @param x The variable in the data frame to filter on. #' #' @returns The filtered data frame. #' @examples #' top10(camDogs, Dog_Name) #' @export ``` --- ### Documentation -- Rd Files * `@param`: Succinct summary of the allowed inputs and what the parameter does. + Most important component of the function documentation + Provide defaults. + If there is a fixed set of values, list them. * `@inheritParams` allows you to inherit argument documentation from another function. * `@returns`: Describe the output object (possibly even its dimensions) * `@examples`: Showcase the most important features with self-contained code. ```r #' Create a data frame with just the 10 most common categories #' #' `top10()` filters a data frame to the rows corresponding to the 10 most common values of a variable. #' #' @param data A data frame. #' @param x The variable in the data frame to filter on. #' #' @returns The filtered data frame. #' @examples #' top10(camDogs, Dog_Name) #' @export ``` --- ### Documentation -- Rd Files For data files: * `@format` gives an overview of the dataset. + Describe each variable. + Include units when it makes sense. + Include possibly categories for categorical variables with only a few categories. * `@source` provides details on where you got the data, often a URL. * Never `@export` a data set. --- ### Documentation -- The Readme * Why should I use this package? * How do I access the package? * How do I use the package? * Let's look at some excellent Readmes: + [`palmerpenguins`](https://github.com/allisonhorst/palmerpenguins) + [`stringr`](https://github.com/tidyverse/stringr) + [`gganimate`](https://github.com/thomasp85/gganimate) --- ### Documentation -- The Readme * To get started, run `usethis::use_readme_rmd()`. Workflow: * Edit the `README.Rmd`, not the `README.md` file so that you can include `R` chunks. * To update the `README.md`, run `devtools::build_readme()`. * Commit and push changes so that the Readme (landing page) of your package on GitHub is up-to-date! --- ### Package Development Still to come: * Testing, testing, testing! * More documentation + Vignettes! * Dissemination + A website!