background-image: url("img/DAW.png") background-position: left background-size: 50% class: middle, center, inverse .pull-right[ ## .whitish[Statistical Thinking] <br> <br> ### .whitish[Kelly McConville] #### .yellow[ Stat 100 | Week 1 | Spring 2022] ] --- class: inverse ## Getting Started in Stat 100 .pull-left[ <img src="img/modules.png" width="85%" style="display: block; margin: auto;" /> ] .pull-right[ * Complete the [Background Survey](https://forms.gle/FdYRHxULGxcAZy2N6). + Due at same time at P-Set 1. * Watch the overview video. * Read over the syllabus. * Check out the [Office Hours Schedule](https://canvas.harvard.edu/courses/102162/pages/check-out-the-office-hours-schedule). ] --- ## Announcements * Lectures are online this week. * Lecture slide decks will always be posted and linked to a Canvas Module the day before lecture. * No sections this week. * Only I will be running office hours this week (all virtual). + I will have extra office hours on Thursday from 10am - noon. **************************** -- ## Week 1 Goals .pull-left[ **Monday Lecture** * Statistical thinking * Introduction to data * Hand-drawn data visualizations ] -- .pull-right[ **Wednesday Lecture** * Stat 100 assessments * Getting up and running in `RStudio` * Working with `RMarkdown` documents ] --- background-image: url("img/structures.001.jpeg") background-position: contain background-size: 65% ## Stat 100 Tech & Materials --- class: inverse ## The Rest of the Teaching Team <img src="img/dreamteam.001.jpeg" width="95%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Stat 100 is about developing our .mustard[statistical thinking] skills. -- ### What is .mustard[statistical thinking]? -- ### Let's collect some data. -- ### Practice round: Give keywords or phrases for Harvard. --- class: inverse, middle, center ## Stat 100 is about developing our .mustard[statistical thinking] skills. ### What is .mustard[statistical thinking]? -- ### It is not the same as mathematical thinking. -- ### Let's discover what .mustard[statistical thinking] is by practicing .mustard[statistical thinking]. --- class: inverse, middle, center ## Start the Statistical Thinking worksheet in small groups. ### We will come back together to discuss. --- ## Problem 1 Discussion <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Minority </th> <th style="text-align:right;"> White </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Acquitted </td> <td style="text-align:right;width: 2cm; "> 60 </td> <td style="text-align:right;width: 2cm; "> 86 </td> </tr> <tr> <td style="text-align:left;"> Convicted </td> <td style="text-align:right;width: 2cm; "> 29 </td> <td style="text-align:right;width: 2cm; "> 45 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;width: 2cm; "> 89 </td> <td style="text-align:right;width: 2cm; "> 131 </td> </tr> </tbody> </table> * Overall, which group was convicted at a higher rate? <br> * When the victim was white, which group was convicted at a higher rate? <br> * When the victim was a minority, which group was convicted at a higher rate? -- .mauve[HOW IS THIS POSSIBLE?] --- ## Simpson's Paradox .mauve[HOW IS THIS POSSIBLE?] <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Minority Defendant Convicted </th> <th style="text-align:right;"> Minority Defendant Acquitted </th> <th style="text-align:right;"> White Defendant Convicted </th> <th style="text-align:right;"> White Defendant Acquitted </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Minority Victim </td> <td style="text-align:right;width: 2cm; "> 19 </td> <td style="text-align:right;width: 2cm; "> 45 </td> <td style="text-align:right;width: 2cm; "> 5 </td> <td style="text-align:right;width: 2cm; "> 19 </td> </tr> <tr> <td style="text-align:left;"> White Victim </td> <td style="text-align:right;width: 2cm; "> 10 </td> <td style="text-align:right;width: 2cm; "> 15 </td> <td style="text-align:right;width: 2cm; "> 40 </td> <td style="text-align:right;width: 2cm; "> 67 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;width: 2cm; "> 29 </td> <td style="text-align:right;width: 2cm; "> 60 </td> <td style="text-align:right;width: 2cm; "> 45 </td> <td style="text-align:right;width: 2cm; "> 86 </td> </tr> </tbody> </table> **Key factors:** -- For what race of the victim, is the conviction rate higher? -- → The conviction rate is 37.9% for white victims and 27.2% for minority victims. -- When the defendant is white, what tends to be the race of the victim? -- → White defendants tend to have white victims. Minority defendants tend to have minority victims. --- #### Problem 2 Discussion <img src="img/GAcovid.jpg" width="90%" style="display: block; margin: auto;" /> --- #### Problem 2 Discussion <img src="img/GAcovid.jpg" width="40%" style="display: block; margin: auto;" /> <img width="45%" src="img/GAcovid2.jpg"/> <img width="49%" src="img/GAcovid_cairo.png"/> --- class: middle, inverse, center ## What is "Statistical Thinking?" --- ## Statistical Thinking .pull-left[ * Importance of the appropriate **measures/metrics**. ] -- .pull-right[ → Considering **proportions** instead of the **raw counts**. ] -- .pull-left[ * Utilizing **multivariate** thinking. ] -- .pull-right[ → When we added a **third** variable (race of the victim) into the picture, the story completely changed! ] -- .pull-left[ * Understanding the importance of **context**. ] -- .pull-right[ → Context explained the Monday jumps in the COVID counts. ] -- .pull-left[ * How we **encode** information in graphs matters. ] -- .pull-right[ → **Design choices** impact the conclusions the viewer draws. ] -- * And so much more! --- class: middle, inverse, center ## What are data? --- * The dictionary definition: > "data: factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" -- Merriam-Webster -- * Wikipedia: > "Data are characteristics or information, usually numerical, that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable." --- * Our textbook definition: > "Data comes to us in a variety of formats, from pictures to text to numbers." -- ModernDive -- * Data Feminism: > "... by the time that information becomes data, it's already been classified in some way. Data after all, is information made *tractable*." -- D'Ignazio and Klein --- ## Data Frames <table class="table table-responsive table-bordered table-striped" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> UserID </th> <th style="text-align:right;"> Tree_Height </th> <th style="text-align:left;"> Common_Name </th> <th style="text-align:left;"> Park </th> <th style="text-align:right;"> DBH </th> <th style="text-align:left;"> Species_Factoid </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 37.4 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 94 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:left;"> Lavalle Hawthorn </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 9.7 </td> <td style="text-align:left;"> Like most hawthorns, the tree has stout thorns up to 2" long. </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:left;"> Northern Red Oak </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 10.3 </td> <td style="text-align:left;"> Acorns take two years to mature and are an important food source for wildlife. </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 102 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 33.2 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 95 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.1 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> </tbody> </table> Data in spreadsheet-like format where: -- * Rows = Observations/cases -- * Columns = Variables --- ## Data Frames <table class="table table-responsive table-bordered table-striped" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> UserID </th> <th style="text-align:right;"> Tree_Height </th> <th style="text-align:left;"> Common_Name </th> <th style="text-align:left;"> Park </th> <th style="text-align:right;"> DBH </th> <th style="text-align:left;"> Species_Factoid </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 37.4 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 94 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:left;"> Lavalle Hawthorn </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 9.7 </td> <td style="text-align:left;"> Like most hawthorns, the tree has stout thorns up to 2" long. </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:left;"> Northern Red Oak </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 10.3 </td> <td style="text-align:left;"> Acorns take two years to mature and are an important food source for wildlife. </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 102 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 33.2 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 95 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.1 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> </tbody> </table> Rows = Observations/cases **What are the cases? What does each row represent?** --- ## Data Frames <table class="table table-responsive table-bordered table-striped" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> UserID </th> <th style="text-align:right;"> Tree_Height </th> <th style="text-align:left;"> Common_Name </th> <th style="text-align:left;"> Park </th> <th style="text-align:right;"> DBH </th> <th style="text-align:left;"> Species_Factoid </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 37.4 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 94 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:left;"> Lavalle Hawthorn </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 9.7 </td> <td style="text-align:left;"> Like most hawthorns, the tree has stout thorns up to 2" long. </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:left;"> Northern Red Oak </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 10.3 </td> <td style="text-align:left;"> Acorns take two years to mature and are an important food source for wildlife. </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 102 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 33.2 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 95 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.1 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> </tbody> </table> Columns = Variables **Variables**: Describe characteristics of the observations -- * **Quantitative**: Numerical in nature -- * **Categorical**: Values are categories -- * **Identification**: Uniquely identify each case --- ## Data Frames <table class="table table-responsive table-bordered table-striped" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> UserID </th> <th style="text-align:right;"> Tree_Height </th> <th style="text-align:left;"> Common_Name </th> <th style="text-align:left;"> Park </th> <th style="text-align:right;"> DBH </th> <th style="text-align:left;"> Species_Factoid </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 37.4 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 94 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:left;"> Lavalle Hawthorn </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 9.7 </td> <td style="text-align:left;"> Like most hawthorns, the tree has stout thorns up to 2" long. </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:left;"> Northern Red Oak </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 10.3 </td> <td style="text-align:left;"> Acorns take two years to mature and are an important food source for wildlife. </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 102 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 33.2 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 95 </td> <td style="text-align:left;"> Douglas-Fir </td> <td style="text-align:left;"> Gammans Park </td> <td style="text-align:right;"> 32.1 </td> <td style="text-align:left;"> Bracts on cones look like a mouse's feet and tail. </td> </tr> </tbody> </table> **Important to understand what each variable represents and the units of measurement.** -- Example questions: * For categorical variables, what are the categories? Do those categories adequately represent the data represented by that variable? -- * For quantitative variables, what values are possible? Were the data rounded or binned? Are those values actually encoding categories? --- ## Hand-Drawn Data Viz * Two key aspects of data visualization: + Determining how you want to display the data. + Figuring out how to tell the computer to do that mapping. -- * Hand-drawn data visualizations allow us to focus on the first part and with full control over the creative process! --- ## Hand-Drawn Data Viz Examples * [Dear Data](http://www.dear-data.com/theproject) > "Each week, and for a year, we collected and measured a particular type of data about our lives, used this data to make a drawing on a postcard-sized sheet of paper, and then dropped the postcard in an English “postbox” (Stefanie) or an American “mailbox” (Giorgia)!" --- ### Dear Data Examples <img src="img/dearDataTime.png" width="73%" style="display: block; margin: auto;" /> --- ### Dear Data Examples <img src="img/dearDataComplaints.png" width="73%" style="display: block; margin: auto;" /> --- ## Mapping Manhattan * Becky Cooper handed out hand-drawn maps of Manhattan to strangers and asked them to ["map their Manhattan."](https://www.goodreads.com/book/show/15842664-mapping-manhattan?from_search=true) <div class="figure" style="text-align: center"> <img src="img/mapmanhattan.png" alt="Map drawn by New Yorker staff writer Patricia Marx" width="100%" /> <p class="caption">Map drawn by New Yorker staff writer Patricia Marx</p> </div> --- ### The Start of Problem Set 1: Create your own Dear Data postcard! .pull-left[ **Step 1** * This week collect data on some aspect of your life. **Step 2** * Find a story in your data and determine your postcard recipient. * Figure out how you want to visualize the story. **Step 3+** * Next week you will get your blank postcard so you can actually create your visualization. ] .pull-right[ <img src="img/supplies.jpg" width="60%" style="display: block; margin: auto;" /> ] --- ### More Dear Data Examples <img src="img/postcards.001.jpeg" width="73%" style="display: block; margin: auto;" /> --- ### More Dear Data Examples <img src="img/postcards.002.jpeg" width="73%" style="display: block; margin: auto;" /> --- ## Reminders * Will get the rest of P-Set 1 on Wednesday. * Make sure to go through the syllabus and Overview video (which can both be found in the Getting Started Module on Canvas). + Will discuss assessments (p-sets, project, exams, engagement) a bit on Wednesday. + But will assume you looked over the Getting Started materials. * No sections this week. * Only I will be running [office hours this week (all virtual)](https://canvas.harvard.edu/courses/102162/pages/check-out-the-office-hours-schedule). + I will have extra office hours on Thursday from 10am - noon.