background-image: url("img/logo_padded.001.jpeg") background-position: left background-size: 60% class: middle, center, .pull-right[ <br> ## .base_color[Data Viz] ## .base_color[Considerations] <br> <br> ### .navy[Kelly McConville] #### .navy[ Stat 108 | Week 1 | Spring 2023] ] --- ## Announcements * No section, no lecture quiz, no p-set this week. * Only I will be running office hours this week at the following times: + Wednesday 1:30 - 3:00 pm in Science Center 316 + Thursday 10:30 - 11:30 am in Science Center 316 (this week only) * Hop into our course Slack *************************************** ## Week 1 Goals .pull-left[ **Day 1 Lecture** * Course overview ] .pull-right[ **Day 2 Lecture** * Develop language to talk about the components of a graphic * Discuss considerations for good graphical design ] --- class: middle, center ## Let's start with the language we will use to describe the components of a graph: -- ## The Grammar of Graphics --- ## Background .left-column[ <img src="img/Leland_Wilkinson.png" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <br> Leland Wilkinson wrote a book called "The Grammar of Graphics" <br> <br> ] -- .left-column[ <img src="img/hadley.jpg" width="100%" style="display: block; margin: auto;" /> ] <br> <br> <br> <br> <br> <br> <br> As part of his PhD in Statistics at Iowa State, Hadley Wickham wrote the `R` package `ggplot2`, which we will use to create static graphs. --- ## The Grammar of Graphics .pull-left[ * **data**: dataset that contains the data * **geom**: geometric shape that the data are mapped to + point, line, bar, text, ... * **aes**thetic: visual properties of the **geom** + x position, y position, color, fill, shape * **coord**: coordinate system + Cartesian, polar, geographic * **scale**: controls how data are mapped to the visual values of the aesthetic + EX: particular colors, linear * **guide**: legend to help user convert visual display back to the data ] .pull-right[ <img src="img/layers.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Geoms versus Names * What are the names of these graphs? <img src="img/bars.png" width="100%" style="display: block; margin: auto;" /> * Focus on the **shapes** (i.e. geoms) and how the variables are **mapped** to those shapes. --- ## [Data Viz Example](https://www.nytimes.com/2019/04/11/learning/whats-going-on-in-this-graph-april-17-2019.html) * Let's practice deconstructing this graph using the grammar of graphics. .left-column[ * Geom(s)? * **Aesthetic**s of the **geom**? + Mapping of variables? * Coord? * Scales? ] .right-column[ <img src="img/baseball.png" width="70%" style="display: block; margin: auto;" /> ] --- ## [Data Viz Example](https://www.nytimes.com/interactive/2018/01/23/climate/trump-offshore-oil-drilling.html) .left-column[ * Geom(s)? * **Aesthetic**s of the **geom**? + Mapping of variables? * Coord? * Scales? ] .right-column[ <img src="img/oil_graphic.png" width="85%" style="display: block; margin: auto;" /> ] --- ### Choices For most data, there won't be just **one** way to graph it. Decisions to be made: -- * What **geom** to use + Point, line, bar, ... -- * For a given geom, how to map variables to its **aesthetics** + Size, location, color, ... -- * For each **aesthetic**, what scale to use + Linear, diverging colors, ... -- * For the graph, what **coordinate system** to use + Cartesian, polar Let's discuss some considerations that can help guide these decisions. But... -- > "Data visualization is part art and part science. The challenge is to get the art right without getting the science wrong and vice versa." -- Claus Wilke -- #### Recommendation: Try out different options and make sure to iterate! --- class: middle, center ### Consideration: Consider variable type when picking the aesthetic mapping. -- ### What aesthetic options do I have at my disposal? --- ##Aesthetics: [Position/location](https://www.nytimes.com/2019/04/11/learning/whats-going-on-in-this-graph-april-17-2019.html) <img src="img/baseball.png" width="55%" style="display: block; margin: auto;" /> --- ##Aesthetics: [Length](https://www.nature.com/articles/d41586-019-03305-w) <img src="img/female_authors.png" width="50%" style="display: block; margin: auto;" /> Hat tip: Grace Benson --- ##Aesthetics: [Area](https://www.nytimes.com/2022/04/07/learning/whats-going-on-in-this-graph-april-13-2022.html) <img src="img/pandemic-spending.png" width="70%" style="display: block; margin: auto;" /> --- ##Aesthetics: [Angle](https://www.nytimes.com/2018/09/25/learning/whats-going-on-in-this-graph-sept-26-2018.html) .pull-left[ <img src="img/pie-chart-dog1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/pie-chart-dog2.png" width="100%" style="display: block; margin: auto;" /> ] --- ##Aesthetics: [Shapes](https://clauswilke.com/dataviz/telling-a-story.html/) <img src="img/pets.png" width="70%" style="display: block; margin: auto;" /> --- ##Aesthetics: [Color Shade](https://fivethirtyeight.com/features/how-you-view-climate-change-might-depend-on-where-you-live/) <img src="img/climate-action.png" width="60%" style="display: block; margin: auto;" /> --- ##Aesthetics: [Color Hue](https://www.nytimes.com/2022/03/03/learning/whats-going-on-in-this-graph-march-9-2022.html) <img src="img/success.png" width="45%" style="display: block; margin: auto;" /> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger1.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger2.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger3.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger4.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger5.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ## [Which represents the larger value?](http://stat405.had.co.nz/lectures/20-effective-vis.pdf) <div class="figure" style="text-align: center"> <img src="img/larger7.png" alt="From Wickham (2012)" width="60%" /> <p class="caption">From Wickham (2012)</p> </div> --- ### Consideration: Consider variable type when picking the aesthetic mapping. * Some aesthetics are ordinal. Some are not. Some can be both! -- * Color palettes, for example, can be : + **Sequential**: Ordered data with one direction + **Diverging**: Ordered data with two directions + **Qualitative**: No order to the data <img src="img/cb_palettes.001.jpeg" width="75%" style="display: block; margin: auto;" /> --- ### Consideration: Consider variable type when picking the aesthetic mapping. * Our ability to perceive differences varies by aesthetic! .pull-left[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-22-1.png" width="576" style="display: block; margin: auto;" /> ] .pull-right[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-23-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### Consideration: [Respect the principle of proportional ink.](https://patient.info/news-and-features/whats-the-average-height-for-men) **Principle of proportional ink:** The size of shaded areas need to be proportional to the data values they represent. <img src="img/men_height.png" width="40%" style="display: block; margin: auto;" /> --- ### Consideration: [Respect the principle of proportional ink.](https://www.businessinsider.com/the-top-10-most-read-books-in-the-world-infographic-2012-12) **Principle of proportional ink:** The size of shaded areas need to be proportional to the data values they represent. <img src="img/books.png" width="35%" style="display: block; margin: auto;" /> -- * Bars on a linear scale should start at 0. --- ### Considerations: [Respect the principle of proportional ink.](https://www.nytimes.com/2020/10/08/learning/whats-going-on-in-this-graph-consumer-spending-during-the-pandemic.html) **Principle of proportional ink:** The size of shared areas need to be proportional to the data values they represent. <img src="img/spending.png" width="45%" style="display: block; margin: auto;" /> * Bars on a linear scale should start at 0. --- ### Considerations: [Respect the principle of proportional ink.](https://fivethirtyeight.com/features/how-you-view-climate-change-might-depend-on-where-you-live/) Difficult to respect with spatial data. Why? <img src="img/climate-action.png" width="50%" style="display: block; margin: auto;" /> --- ### Considerations: [Respect the principle of proportional ink.](https://projects.fivethirtyeight.com/gop-house-2022/) Instead of using geographic boundaries, pick a standardized shape and place "near" geographic location. <img src="img/waffle.png" width="50%" style="display: block; margin: auto;" /> --- ### Considerations: [Try to have a high data-ink ratio.](https://vizdata.org/slides/01/01-layers-1.html#/section) **Data-ink ratio**: "proportion of a graphic's ink devoted to the non-redundant display of data-information." -- Edward Tufte .pull-left[ <img src="img/mean-area-decade-bar-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/mean-area-decade-scatter-1.png" width="100%" style="display: block; margin: auto;" /> ] Hat tip: Mine Çetinkaya-Rundel --- ### Considerations: [Think carefully about context.](https://fivethirtyeight.com/features/what-ruth-bader-ginsburgs-death-could-mean-for-2020-and-the-supreme-court/) .pull-left[ Consider including: * Title (or Figure Caption) * Subtitle with maker and data source * Caption with key points ] .pull-right[ <br> * Legends/helpers (with units) * Axis labels (with units) * Other annotations or reference points ] -- .left-column[ <br> <br> What to add depends **greatly** on the research question! ] .right-column[ <img src="img/rbg.png" width="75%" style="display: block; margin: auto;" /> ] --- ### Considerations: [Think carefully about context.](https://www.nytimes.com/2020/01/23/learning/whats-going-on-in-this-graph-jan-29-2020.html) .pull-left[ Consider including: * Title (or Figure Caption) * Subtitle with maker and data source * Caption with key points ] .pull-right[ <br> * Legends/helpers (with units) * Axis labels (with units) * Other annotations or reference points ] .left-column[ <br> <br> Context should add both **memorability** and **clarity**. ] .right-column[ <img src="img/sport_injuries.png" width="80%" style="display: block; margin: auto;" /> ] --- ### Considerations: [Simplify as much as you can.](https://fivethirtyeight.com/features/ted-cruzs-general-election-strategy-is-wishful-thinking/) .pull-left[ * Faceting is a great way to add another variable without over-complicating your graphic. * But only add additional variables that are useful to the story! ] .pull-right[ <img src="img/vote_more.png" width="80%" style="display: block; margin: auto;" /> ] --- ### Considerations: Simplify as much as you can. * Over-plotting is very common in the Age of Big Data! * Example from my own work with the US Forest Inventory and Analysis Program .pull-left[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-34-1.png" width="576" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-35-1.png" width="576" style="display: block; margin: auto;" /> * Jitter the points. ] --- ### Considerations: Simplify as much as you can. .pull-left[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-36-1.png" width="576" style="display: block; margin: auto;" /> * Add transparency. ] -- .pull-right[ <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-37-1.png" width="576" style="display: block; margin: auto;" /> * Bin the data and then try a different `geom`. ] Pros and cons of different approaches? --- ### Considerations: [Make important comparisons easy.](https://serialmentor.com/dataviz/visualizing-proportions.html) * Which graph makes it easy to conclude that the ruling coalition (FDP + SPD) have a majority? <div class="figure" style="text-align: center"> <img src="img/bundestag.png" alt="Wilke (2019)" width="80%" /> <p class="caption">Wilke (2019)</p> </div> --- ### Considerations: [Make important comparisons easy.](https://serialmentor.com/dataviz/visualizing-proportions.html) * Which graph makes it easy to see how a company's market share changes over time? (Warning: Fake data.) <div class="figure" style="text-align: center"> <img src="img/trend.png" alt="Wilke (2019)" width="80%" /> <p class="caption">Wilke (2019)</p> </div> --- ### Considerations: Make your graphs accessible! * Not all `R` color palettes have been vetted for color blindness. <img src="img/color_palettes.png" width="60%" style="display: block; margin: auto;" /> --- ### Considerations: Make your graphs accessible! * Not all `R` color palettes have been vetted for color blindness. .pull-left[ <img src="img/red_blind.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/green_blind.png" width="100%" style="display: block; margin: auto;" /> ] --- ### Considerations: Make your graphs accessible! * Color contrast also matters. + Shoot for a ratio of 4.5 or higher between overlapping colors. <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-43-1.png" width="720" style="display: block; margin: auto;" /> .pull-left[ ```r library(coloratio) cr_get_ratio("#A71F69", "#EBEBEB") ``` ``` ## [1] 5.78868 ``` ] .pull-right[ ```r cr_get_ratio("#A4DBE8", "#EBEBEB") ``` ``` ## [1] 1.269455 ``` ] --- ### Considerations: Make your graphs accessible! * Use white space to help separate elements. <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-46-1.png" width="936" style="display: block; margin: auto;" /> --- ### Considerations: Make your graphs accessible! * Use a large enough font size! <img src="stat108_wk01wed_files/figure-html/unnamed-chunk-47-1.png" width="936" style="display: block; margin: auto;" /> --- class: middle ### Data Viz Considerations * We could spend all semester on data viz principles. -- * Be thoughtful and iterate. -- * Let's spend some time considering the strengths and weaknesses of some graphs. + Pick out the: + Clearest story + Most memorable + Best overall --- ### Reminders * No section, no lecture quiz, no p-set this week. * Only I will be running office hours this week at the following times: + Wednesday 1:30 - 3:00 pm in Science Center 316 + Thursday 10:30 - 11:30 am in Science Center 316 (this week only)