Data Visualization
Kelly McConville
Stat 100
Week 2 | Fall 2023
Teachly is a platform that allows you to fill out a profile so that we can get to know you and your interests in stats/data science better.
You should have received two emails:
Each question is optional. You will not be assessed on its completion or your answers.
Ways we plan to use Teachly:
First Segment:
Second Segment:
ggplot2
.To explore the data.
To summarize the data.
To showcase trends and make comparisons.
To tell a compelling story.
On January 27th, 1986, engineers from Morton Thiokol recommended NASA delay launch of space shuttle Challenger due to cold weather.
After a two hour conference call, the engineer’s recommendation was overruled due to lack of persuasive evidence and the launch proceeded.
The Challenger exploded 73 seconds into launch.
Here’s one of those charts.
Here’s another one of those charts.
Here’s a graphic I created from Edward Tufte’s data.
This adaptation is a recreation of Edward Tufte’s graphic.
We will use this grammar to:
Decompose and understand existing graphs.
Create our own graphs with the R
package ggplot2
.
For right now, we won’t focus on the names of particular types of graphs (e.g., scatterplot) but on the elements of graphs.
For context, at a minimum include
Think about the stories/questions your visualization answers.
Determine what context/background information your viewer needs.
Visualizing data involves editorial choices.
Consider color blindness.
Maps, like the Dude map are also a great way to provide context!
Washington Post’s Approach:
Because of all the design choices, it is much easier to make a bad graph than a good graph.
Be careful that your design choices don’t cause your viewer to draw incorrect conclusions about the data:
Good graphics are one’s where the findings and insights are obvious to the viewer.
Facilitate the comparisons that correspond to the research question.
Data visualizations are not neutral.
It is easier to see the differences and similarities between different types of graphics if we learn the grammar of graphics.
Practicing decomposing graphics should make it easier for us to compose our own graphics.
ggplot2
is part of this collection of data science packages.
Rows: 192
Columns: 8
$ DateTime <chr> "07/04/2019 12:00:00 AM", "07/04/2019 12:15:00 AM", "07/04/2…
$ Day <chr> "Thursday", "Thursday", "Thursday", "Thursday", "Thursday", …
$ Date <date> 2019-07-04, 2019-07-04, 2019-07-04, 2019-07-04, 2019-07-04,…
$ Time <time> 00:00:00, 00:15:00, 00:30:00, 00:45:00, 01:00:00, 01:15:00,…
$ Total <dbl> 2, 3, 2, 0, 3, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, …
$ Westbound <dbl> 2, 3, 1, 0, 2, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, …
$ Eastbound <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
$ Occasion <chr> "Fourth of July", "Fourth of July", "Fourth of July", "Fourt…
# A tibble: 6 × 8
DateTime Day Date Time Total Westbound Eastbound Occasion
<chr> <chr> <date> <tim> <dbl> <dbl> <dbl> <chr>
1 07/04/2019 12:00:00… Thur… 2019-07-04 00:00 2 2 0 Fourth …
2 07/04/2019 12:15:00… Thur… 2019-07-04 00:15 3 3 0 Fourth …
3 07/04/2019 12:30:00… Thur… 2019-07-04 00:30 2 1 1 Fourth …
4 07/04/2019 12:45:00… Thur… 2019-07-04 00:45 0 0 0 Fourth …
5 07/04/2019 01:00:00… Thur… 2019-07-04 01:00 3 2 1 Fourth …
6 07/04/2019 01:15:00… Thur… 2019-07-04 01:15 2 2 0 Fourth …
What does a row represent here?
ggplot2
example codeGuiding Principle: We will map variables from the data to the aesthetic attributes (e.g. location, size, shape, color) of geometric objects (e.g. points, lines, bars).
scales_---_---()
and labs()
, but we will wait on those.Binned counts of data.
Great for assessing shape.
aes()
geom_---()
ggplot2