background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[Animation with `gganimate`] ## .base_color[and] ## .base_color[Data Wrangling with `dplyr`] #### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 3 | Spring 2023] ] --- ## Announcements * Need GitHub usernames ASAP: Fill out [this short form](https://forms.gle/H67c4XftzcD9e1vr7) if you haven't already. * P-Set 1 due at 5pm on Wed. --- ## Week 3 Goals .pull-left[ **Mon Lecture** * Animation with `gganimate` * Data wrangling with `dplyr` ] .pull-right[ **Wed Lecture** * GitHub/git * RStudio Projects * Basic interactivity with `plotly` + More advanced interactivity to come later on. ] --- ## Move Over Nate Silver * [Hans Rosling](https://en.wikipedia.org/wiki/Hans_Rosling) + [The Joy of Stats](https://vimeo.com/18477762) -- <img src="img/rosling_and_mcconville.JPG" width="60%" style="display: block; margin: auto;" /> --- ### Stat 108 Names <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-2-1.gif" style="display: block; margin: auto;" /> --- ### Volcanic Eruptions in the Aleutian Islands from 2000+ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-4-1.gif" style="display: block; margin: auto;" /> --- ### Data wrangling needed! * Both data frames need to be wrangled before they are ready to be turned into animated graphs. .pull-left[ ```r library(babynames) head(babynames) ``` ``` ## # A tibble: 6 × 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Mary 7065 0.0724 ## 2 1880 F Anna 2604 0.0267 ## 3 1880 F Emma 2003 0.0205 ## 4 1880 F Elizabeth 1939 0.0199 ## 5 1880 F Minnie 1746 0.0179 ## 6 1880 F Margaret 1578 0.0162 ``` ] .pull-right[ ```r head(babynames_stat108) ``` ``` ## # A tibble: 6 × 3 ## year name n ## <dbl> <chr> <int> ## 1 1880 Omar 14 ## 2 1880 Serena 13 ## 3 1880 Tony 42 ## 4 1881 Audrey 11 ## 5 1881 Omar 11 ## 6 1881 Serena 13 ``` ] --- ### Data wrangling needed! * Both data frames need to be wrangled before they are ready to be turned into animated graphs. ```r eruptions <- read_csv("https://raw.githubusercontent.com/harvard-stat108s23/materials/main/psets/data/GVP_Eruption_Results.csv") head(eruptions) ``` ``` ## # A tibble: 6 × 24 ## Volcan…¹ Volca…² Erupt…³ Erupt…⁴ Areao…⁵ VEI VEIMo…⁶ Start…⁷ Start…⁸ Start…⁹ ## <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> ## 1 273010 "Bulus… 22114 Confir… <NA> NA <NA> <NA> 2016 NA ## 2 290390 "Alaid" 22116 Confir… <NA> NA <NA> <NA> 2016 NA ## 3 345070 "Turri… 22118 Confir… <NA> NA <NA> <NA> 2016 NA ## 4 343100 "San M… 22117 Confir… <NA> NA <NA> <NA> 2016 NA ## 5 357070 "Chill… 22119 Confir… <NA> NA <NA> <NA> 2016 NA ## 6 266030 "Soput… 22105 Confir… <NA> NA <NA> <NA> 2016 NA ## # … with 14 more variables: StartMonth <dbl>, StartDayModifier <chr>, ## # StartDay <dbl>, StartDayUncertainty <dbl>, `EvidenceMethod(dating)` <chr>, ## # EndYearModifier <chr>, EndYear <dbl>, EndYearUncertainty <lgl>, ## # EndMonth <dbl>, EndDayModifier <chr>, EndDay <dbl>, ## # EndDayUncertainty <dbl>, Latitude <dbl>, Longitude <dbl>, and abbreviated ## # variable names ¹VolcanoNumber, ²VolcanoName, ³EruptionNumber, ## # ⁴EruptionCategory, ⁵AreaofActivity, ⁶VEIModifier, ⁷StartYearModifier, … ``` --- ### Data wrangling needed! * Both data frames need to be wrangled before they are ready to be turned into animated graphs. ```r head(eruptions_aleutian) ``` ``` ## # A tibble: 6 × 5 ## VolcanoName Latitude Longitude Start End ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Cleveland 52.8 -170. 5643 5742 ## 2 Pavlof 55.4 -162. 5429 5432 ## 3 Pavlof 55.4 -162. 5264 5266 ## 4 Shishaldin 54.8 -164. 5151 5771 ## 5 Cleveland 52.8 -170. 5110 5269 ## 6 Veniaminof 56.2 -159. 4912 5033 ``` --- ## [Data In the Wild](https://www.bts.gov/content/age-and-availability-amtrak-locomotive-and-car-fleets) Unfortunately, many datasets on the internet are often in **Display Format**, not **Analysis Format**. <img src="img/amtrak.png" width="60%" style="display: block; margin: auto;" /> --- ## Data Wrangling Unfortunately, many datasets on the internet are often in **Display Format**, not **Analysis Format**. ```r library(readxl) url <- "https://www.bts.gov/sites/bts.dot.gov/files/table_01_33_102020.xlsx" destfile <- "table_01_33_102020.xlsx" curl::curl_download(url, destfile) table_01_33_102020 <- read_excel(destfile) table_01_33_102020 ``` ``` ## # A tibble: 20 × 35 ## Table 1-…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 <NA> 1972 1975 1980 1985 1990 1991 1992 1993 1994 1995 ## 2 Locomotiv… <NA> NA NA NA NA NA NA NA NA NA ## 3 Percent a… U 87 83 93 84 86 83 84 85 88 ## 4 Average a… 22.3 14.4 7.4 7 12 13 13 13.2 13.4 13.9 ## 5 Passenger… <NA> NA NA NA NA NA NA NA NA NA ## 6 Percent a… U 82 77 90 90 92 90 89 88 90 ## 7 Average a… 22 24.7 14.3 14.2 20 21 21.5 22.6 22.4 21.8 ## 8 KEY: U =… <NA> NA NA NA NA NA NA NA NA NA ## 9 <NA> <NA> NA NA NA NA NA NA NA NA NA ## 10 a Year-en… <NA> NA NA NA NA NA NA NA NA NA ## 11 b Fiscal … <NA> NA NA NA NA NA NA NA NA NA ## 12 <NA> <NA> NA NA NA NA NA NA NA NA NA ## 13 NOTES <NA> NA NA NA NA NA NA NA NA NA ## 14 1972 was … <NA> NA NA NA NA NA NA NA NA NA ## 15 Roadraile… <NA> NA NA NA NA NA NA NA NA NA ## 16 <NA> <NA> NA NA NA NA NA NA NA NA NA ## 17 SOURCES <NA> NA NA NA NA NA NA NA NA NA ## 18 1972-80: … <NA> NA NA NA NA NA NA NA NA NA ## 19 1985-2000… <NA> NA NA NA NA NA NA NA NA NA ## 20 2001-19: … <NA> NA NA NA NA NA NA NA NA NA ## # … with 24 more variables: ...12 <dbl>, ...13 <dbl>, ...14 <dbl>, ...15 <dbl>, ## # ...16 <dbl>, ...17 <chr>, ...18 <chr>, ...19 <dbl>, ...20 <dbl>, ## # ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>, ## # ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>, ## # ...31 <dbl>, ...32 <chr>, ...33 <chr>, ...34 <chr>, ...35 <chr>, and ## # abbreviated variable name ## # ¹`Table 1-33: Age and Availability of Amtrak Locomotive and Car Fleets` ``` --- ## Data Wrangling * What is it? + Any processing you have to do to the data to summarize, visualize, model it. + EXs: + Recoding 999 as "NA" + Removing rows + Creating new variables + Recoding category variables + Fixing variable types + Reshaping data into a format that satisfies the tidy data principles --- ## Data Wrangling .pull-left[ * We are going to learn **several** packages to help us wrangle data. * Need `dplyr` right now! ] .pull-right[ <img src="img/data_wrangling_packages.010.jpeg" width="100%" style="display: block; margin: auto;" /> ] --- ## Stat 108 `dplyr` Experience <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-12-1.png" width="60%" style="display: block; margin: auto;" /> --- ### dplyr for Data Wrangling * Common wrangling verbs: + `select()` + `mutate()` + `filter()` + `arrange()` + `summarize()` + `drop_na()` * One action: + `group_by()` ### Pipe for chaining together commands * `magrittr` pipe: `%>%` * New base `R` (native) pipe: |> --- ### Wrangling `babynames` #### What wrangling do we need to do to get from raw data to wrangled data? .pull-left[ ```r library(babynames) head(babynames) ``` ``` ## # A tibble: 6 × 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Mary 7065 0.0724 ## 2 1880 F Anna 2604 0.0267 ## 3 1880 F Emma 2003 0.0205 ## 4 1880 F Elizabeth 1939 0.0199 ## 5 1880 F Minnie 1746 0.0179 ## 6 1880 F Margaret 1578 0.0162 ``` ] .pull-right[ ```r head(babynames_stat108) ``` ``` ## # A tibble: 6 × 3 ## year name n ## <dbl> <chr> <int> ## 1 1880 Omar 14 ## 2 1880 Serena 13 ## 3 1880 Tony 42 ## 4 1881 Audrey 11 ## 5 1881 Omar 11 ## 6 1881 Serena 13 ``` ] --- ### Wrangling `babynames` `filter()`: Subset rows based on logic statements . ```r babynames_stat108 <- filter(babynames, name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) babynames_stat108 ``` ``` ## # A tibble: 1,053 × 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Serena 13 0.000133 ## 2 1880 M Tony 42 0.000355 ## 3 1880 M Omar 14 0.000118 ## 4 1881 F Serena 13 0.000132 ## 5 1881 F Audrey 11 0.000111 ## 6 1881 M Tony 36 0.000332 ## 7 1881 M Omar 11 0.000102 ## 8 1882 F Serena 14 0.000121 ## 9 1882 F Audrey 7 0.0000605 ## 10 1882 M Tony 46 0.000377 ## # … with 1,043 more rows ``` --- ### Wrangling `babynames` `%>%` (the pipe): Inserts current line as first argument in the next line. ```r babynames_stat108 <- filter(babynames, name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) ``` ```r babynames_stat108 <- babynames %>% filter(name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) babynames_stat108 ``` ``` ## # A tibble: 1,053 × 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Serena 13 0.000133 ## 2 1880 M Tony 42 0.000355 ## 3 1880 M Omar 14 0.000118 ## 4 1881 F Serena 13 0.000132 ## 5 1881 F Audrey 11 0.000111 ## 6 1881 M Tony 36 0.000332 ## 7 1881 M Omar 11 0.000102 ## 8 1882 F Serena 14 0.000121 ## 9 1882 F Audrey 7 0.0000605 ## 10 1882 M Tony 46 0.000377 ## # … with 1,043 more rows ``` --- ### Wrangling `babynames` `summarize()`: Compute an aggregation measure (e.g., mean, standard deviation). ```r babynames_stat108 <- babynames %>% filter(name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) %>% summarize(n = sum(n)) babynames_stat108 ``` ``` ## # A tibble: 1 × 1 ## n ## <int> ## 1 1047778 ``` --- ### Wrangling `babynames` `group_by()`: Create groups for how future operations should be computed. ```r babynames_stat108 <- babynames %>% filter(name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) %>% group_by(name) %>% summarize(n = sum(n)) babynames_stat108 ``` ``` ## # A tibble: 6 × 2 ## name n ## <chr> <int> ## 1 Audrey 279747 ## 2 Hailey 157201 ## 3 Ian 222950 ## 4 Omar 95654 ## 5 Serena 40851 ## 6 Tony 251375 ``` * What are we missing? --- ### Wrangling `babynames` `group_by()`: Create groups for how future operations should be computed. ```r babynames_stat108 <- babynames %>% filter(name %in% c("Serena", "Hailey", "Tony", "Audrey", "Omar", "Ian")) %>% group_by(year, name) %>% summarize(n = sum(n)) %>% ungroup() babynames_stat108 ``` ``` ## # A tibble: 711 × 3 ## year name n ## <dbl> <chr> <int> ## 1 1880 Omar 14 ## 2 1880 Serena 13 ## 3 1880 Tony 42 ## 4 1881 Audrey 11 ## 5 1881 Omar 11 ## 6 1881 Serena 13 ## 7 1881 Tony 36 ## 8 1882 Audrey 7 ## 9 1882 Omar 11 ## 10 1882 Serena 14 ## # … with 701 more rows ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) #### What wrangling do we need to do to get from raw data to wrangled data? ```r head(eruptions) ``` ``` ## # A tibble: 6 × 24 ## Volcan…¹ Volca…² Erupt…³ Erupt…⁴ Areao…⁵ VEI VEIMo…⁶ Start…⁷ Start…⁸ Start…⁹ ## <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> ## 1 273010 "Bulus… 22114 Confir… <NA> NA <NA> <NA> 2016 NA ## 2 290390 "Alaid" 22116 Confir… <NA> NA <NA> <NA> 2016 NA ## 3 345070 "Turri… 22118 Confir… <NA> NA <NA> <NA> 2016 NA ## 4 343100 "San M… 22117 Confir… <NA> NA <NA> <NA> 2016 NA ## 5 357070 "Chill… 22119 Confir… <NA> NA <NA> <NA> 2016 NA ## 6 266030 "Soput… 22105 Confir… <NA> NA <NA> <NA> 2016 NA ## # … with 14 more variables: StartMonth <dbl>, StartDayModifier <chr>, ## # StartDay <dbl>, StartDayUncertainty <dbl>, `EvidenceMethod(dating)` <chr>, ## # EndYearModifier <chr>, EndYear <dbl>, EndYearUncertainty <lgl>, ## # EndMonth <dbl>, EndDayModifier <chr>, EndDay <dbl>, ## # EndDayUncertainty <dbl>, Latitude <dbl>, Longitude <dbl>, and abbreviated ## # variable names ¹VolcanoNumber, ²VolcanoName, ³EruptionNumber, ## # ⁴EruptionCategory, ⁵AreaofActivity, ⁶VEIModifier, ⁷StartYearModifier, … ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) #### What wrangling do we need to do to get from raw data to wrangled data? ```r head(eruptions_aleutian) ``` ``` ## # A tibble: 6 × 5 ## VolcanoName Latitude Longitude Start End ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Cleveland 52.8 -170. 5643 5742 ## 2 Pavlof 55.4 -162. 5429 5432 ## 3 Pavlof 55.4 -162. 5264 5266 ## 4 Shishaldin 54.8 -164. 5151 5771 ## 5 Cleveland 52.8 -170. 5110 5269 ## 6 Veniaminof 56.2 -159. 4912 5033 ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) * More `filter()`ing ```r eruptions_aleutian <- eruptions %>% filter(Longitude > -172.164, Longitude < -157.1507, Latitude > 50.977, Latitude < 59.5617, StartYear >= 2000) eruptions_aleutian ``` ``` ## # A tibble: 29 × 24 ## Volca…¹ Volca…² Erupt…³ Erupt…⁴ Areao…⁵ VEI VEIMo…⁶ Start…⁷ Start…⁸ Start…⁹ ## <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> ## 1 311240 Clevel… 21082 Confir… <NA> NA <NA> <NA> 2015 NA ## 2 312030 Pavlof 20975 Confir… <NA> 1 <NA> <NA> 2014 NA ## 3 312030 Pavlof 20909 Confir… NE fla… NA <NA> <NA> 2014 NA ## 4 311360 Shisha… 20937 Confir… <NA> 1 <NA> <NA> 2014 NA ## 5 311240 Clevel… 20845 Confir… <NA> 3 <NA> <NA> 2013 NA ## 6 312070 Veniam… 20836 Confir… Wester… 3 <NA> <NA> 2013 NA ## 7 312030 Pavlof 20807 Confir… Summit 3 <NA> <NA> 2013 NA ## 8 311240 Clevel… 20758 Confir… Summit… 2 <NA> <NA> 2011 NA ## 9 311240 Clevel… 19820 Uncert… <NA> 2 ? <NA> 2010 NA ## 10 311240 Clevel… 19819 Confir… <NA> 2 <NA> <NA> 2010 NA ## # … with 19 more rows, 14 more variables: StartMonth <dbl>, ## # StartDayModifier <chr>, StartDay <dbl>, StartDayUncertainty <dbl>, ## # `EvidenceMethod(dating)` <chr>, EndYearModifier <chr>, EndYear <dbl>, ## # EndYearUncertainty <lgl>, EndMonth <dbl>, EndDayModifier <chr>, ## # EndDay <dbl>, EndDayUncertainty <dbl>, Latitude <dbl>, Longitude <dbl>, and ## # abbreviated variable names ¹VolcanoNumber, ²VolcanoName, ³EruptionNumber, ## # ⁴EruptionCategory, ⁵AreaofActivity, ⁶VEIModifier, ⁷StartYearModifier, … ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) * `mutate()`: Create new variables or modify existing variables ```r eruptions_aleutian <- eruptions %>% filter(Longitude > -172.164, Longitude < -157.1507, Latitude > 50.977, Latitude < 59.5617, StartYear >= 2000) %>% mutate(StartDate = make_date(StartYear, StartMonth, StartDay), EndDate = make_date(EndYear, EndMonth, EndDay), Start = as.numeric(StartDate - as_date("2000-01-01")), End = as.numeric(EndDate - as_date("2000-01-01"))) eruptions_aleutian ``` ``` ## # A tibble: 29 × 28 ## Volca…¹ Volca…² Erupt…³ Erupt…⁴ Areao…⁵ VEI VEIMo…⁶ Start…⁷ Start…⁸ Start…⁹ ## <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> ## 1 311240 Clevel… 21082 Confir… <NA> NA <NA> <NA> 2015 NA ## 2 312030 Pavlof 20975 Confir… <NA> 1 <NA> <NA> 2014 NA ## 3 312030 Pavlof 20909 Confir… NE fla… NA <NA> <NA> 2014 NA ## 4 311360 Shisha… 20937 Confir… <NA> 1 <NA> <NA> 2014 NA ## 5 311240 Clevel… 20845 Confir… <NA> 3 <NA> <NA> 2013 NA ## 6 312070 Veniam… 20836 Confir… Wester… 3 <NA> <NA> 2013 NA ## 7 312030 Pavlof 20807 Confir… Summit 3 <NA> <NA> 2013 NA ## 8 311240 Clevel… 20758 Confir… Summit… 2 <NA> <NA> 2011 NA ## 9 311240 Clevel… 19820 Uncert… <NA> 2 ? <NA> 2010 NA ## 10 311240 Clevel… 19819 Confir… <NA> 2 <NA> <NA> 2010 NA ## # … with 19 more rows, 18 more variables: StartMonth <dbl>, ## # StartDayModifier <chr>, StartDay <dbl>, StartDayUncertainty <dbl>, ## # `EvidenceMethod(dating)` <chr>, EndYearModifier <chr>, EndYear <dbl>, ## # EndYearUncertainty <lgl>, EndMonth <dbl>, EndDayModifier <chr>, ## # EndDay <dbl>, EndDayUncertainty <dbl>, Latitude <dbl>, Longitude <dbl>, ## # StartDate <date>, EndDate <date>, Start <dbl>, End <dbl>, and abbreviated ## # variable names ¹VolcanoNumber, ²VolcanoName, ³EruptionNumber, … ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) `select()`: Subset the variables ```r eruptions_aleutian <- eruptions %>% filter(Longitude > -172.164, Longitude < -157.1507, Latitude > 50.977, Latitude < 59.5617, StartYear >= 2000) %>% mutate(StartDate = make_date(StartYear, StartMonth, StartDay), EndDate = make_date(EndYear, EndMonth, EndDay), Start = as.numeric(StartDate - as_date("2000-01-01")), End = as.numeric(EndDate - as_date("2000-01-01"))) %>% select(VolcanoName, Latitude, Longitude, Start, End) eruptions_aleutian ``` ``` ## # A tibble: 29 × 5 ## VolcanoName Latitude Longitude Start End ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Cleveland 52.8 -170. 5643 5742 ## 2 Pavlof 55.4 -162. 5429 5432 ## 3 Pavlof 55.4 -162. 5264 5266 ## 4 Shishaldin 54.8 -164. 5151 5771 ## 5 Cleveland 52.8 -170. 5110 5269 ## 6 Veniaminof 56.2 -159. 4912 5033 ## 7 Pavlof 55.4 -162. 4881 4925 ## 8 Cleveland 52.8 -170. 4218 4881 ## 9 Cleveland 52.8 -170. 3907 3921 ## 10 Cleveland 52.8 -170. 3802 3805 ## # … with 19 more rows ``` --- ### Wrangling [Eruptions](https://volcano.si.edu/database/search_eruption_results.cfm) `drop_na()`: Remove any row that has a missing value for particular variables ```r eruptions_aleutian <- eruptions %>% filter(Longitude > -172.164, Longitude < -157.1507, Latitude > 50.977, Latitude < 59.5617, StartYear >= 2000) %>% mutate(StartDate = make_date(StartYear, StartMonth, StartDay), EndDate = make_date(EndYear, EndMonth, EndDay), Start = as.numeric(StartDate - as_date("2000-01-01")), End = as.numeric(EndDate - as_date("2000-01-01"))) %>% select(VolcanoName, Latitude, Longitude, Start, End) %>% drop_na(Start, End) eruptions_aleutian ``` ``` ## # A tibble: 26 × 5 ## VolcanoName Latitude Longitude Start End ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Cleveland 52.8 -170. 5643 5742 ## 2 Pavlof 55.4 -162. 5429 5432 ## 3 Pavlof 55.4 -162. 5264 5266 ## 4 Shishaldin 54.8 -164. 5151 5771 ## 5 Cleveland 52.8 -170. 5110 5269 ## 6 Veniaminof 56.2 -159. 4912 5033 ## 7 Pavlof 55.4 -162. 4881 4925 ## 8 Cleveland 52.8 -170. 4218 4881 ## 9 Cleveland 52.8 -170. 3907 3921 ## 10 Cleveland 52.8 -170. 3802 3805 ## # … with 16 more rows ``` --- class: middle, center ## Now we are ready to animate some graphs! #### Will explore two examples and then explore more features of `gganimate` on P-Set 2! --- ## `gganimate` ```r library(gganimate) ``` * Extends `ggplot2` to allow for animation. + Additional layers * Core functions: + `transition_*()`: Defining the variables that control the change and how they control the change + `enter/exit*()`: Determining how data enters and exits + `view_*()`: Changing axes + `shadow_*()`: Giving the animation memory + `animate()`: Tuning the gif speed and size --- ## Create a Static Version First .pull-left[ ```r p <- ggplot(data = babynames_stat108, mapping = aes(x = year, y = n, color = name)) + geom_line(size = 2) + theme(legend.position = "bottom", text = element_text(size=20)) + guides(color = guide_legend(nrow = 2)) + labs(y = "Number of Births", x = "Year", color = "Baby Name") p ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/babynames1-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### Then Animate! `transition_reveal()`: Adding each new frame on top of the previous frames .pull-left[ ```r p_animate <- p + transition_reveal(along = year) animate(p_animate, fps = 5, end_pause = 40, height = 4, width = 6.5, units = "in", res = 100) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-30-1.gif" style="display: block; margin: auto;" /> ] --- ### Controlling the Animation .pull-left[ ```r p_animate <- p + transition_reveal(along = year) animate(p_animate, fps = 5, end_pause = 40, height = 4, width = 6.5, units = "in", res = 100) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-32-1.gif" style="display: block; margin: auto;" /> ] --- ## Adding Animated Context .pull-left[ ```r p_animate <- p + transition_reveal(year) + labs(title = "The Year is {round(frame_along, 0)}.") animate(p_animate, fps = 5, end_pause = 40, height = 4, width = 6.5, units = "in", res = 100) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-34-1.gif" style="display: block; margin: auto;" /> ] --- ## Create a Static Version First .pull-left[ ```r library(ggmap) aleutian_box <- c(bottom = 50.977, left = -172.164, top = 59.5617, right = -157.1507) aleutian <- get_stamenmap(aleutian_box, maptype = "watercolor", zoom = 5) p_aleutian <- aleutian %>% ggmap() + theme_void() p_aleutian ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/vol1-1.png" width="768" style="display: block; margin: auto;" /> ] --- ## Create a Static Version First .pull-left[ ```r p_aleutian <- aleutian %>% ggmap() + geom_point(data = eruptions_aleutian, mapping = aes(Longitude, Latitude), color = "red", size = 8) + theme_void() p_aleutian ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/vol2-1.png" width="768" style="display: block; margin: auto;" /> ] --- ### Then Animate! `transition_events()`: Data both comes and goes at a particular time .pull-left[ ```r p_aleutian_an <- p_aleutian + transition_events(start = Start, end = End, enter_length = 100, exit_length = 100) animate(p_aleutian_an, nframes = 100, fps = 2) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-38-1.gif" style="display: block; margin: auto;" /> ] --- ### Impact how objects come and go .pull-left[ ```r p_aleutian_an <- p_aleutian + transition_events(start = Start, end = End, enter_length = 100, exit_length = 100) + enter_grow() + exit_shrink() + labs(caption = "{ymd(\"2000-01-01\") + frame_time}") animate(p_aleutian_an, nframes = 100, fps = 2) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-40-1.gif" style="display: block; margin: auto;" /> ] --- ### Change the Geom .pull-left[ ```r library(ggimage) volcano_img <- "https://raw.githubusercontent.com/harvard-stat108s23/materials/main/img/volcano.png" p_aleutian_an <- aleutian %>% ggmap() + geom_image(data = eruptions_aleutian, mapping = aes(Longitude, Latitude), image = volcano_img) + theme_void() + transition_events(start = Start, end = End, enter_length = 100, exit_length = 100) + enter_grow() + exit_shrink() + labs(caption = "{ymd(\"2000-01-01\") + frame_time}") animate(p_aleutian_an, nframes = 100, fps = 2) ``` ] .pull-right[ <img src="stat108_wk03mon_files/figure-html/unnamed-chunk-42-1.gif" style="display: block; margin: auto;" /> ] Volcano icon found [here](<a href="https://www.flaticon.com/free-icons/volcano" title="volcano icons">Volcano icons created by Smashicons - Flaticon</a>) --- ## Why Add Animation to a Graph? -- * To engage the viewer * To accentuate the story * To add another variable to the plot But don't add animation just because you can. Drawbacks? -- * Require a higher level of attention * Can obscure the story --- ### Reminders * Need GitHub usernames ASAP: Fill out [this short form](https://forms.gle/H67c4XftzcD9e1vr7) if you haven't already. * P-Set 1 due at 5pm on Wed.