ggplot()

SDS 192: Introduction to Data Science

Lindsay Poirier
Statistical & Data Sciences, Smith College

Fall 2022

For Today

  • Visualizations Exercise
  • Introduction to ggplot()
  • Class Activity

Let’s create the following data frame to motivate today’s lecture.

This dataset comes from Pioneer Valley Data and documents estimates of population characteristics for each municipality in the Pioneer Valley.

library(tidyverse)
pioneer_valley_census_data <- read.csv("https://raw.githubusercontent.com/SDS-192-Intro/SDS-192-public-website/main/slides/datasets/pioneer_valley_census.csv")
hampshire_census_data <- pioneer_valley_census_data %>% 
  filter(COUNTY == "Hampshire")

ggplot

  • Most plots we create in this course will rely on package called ggplot2
  • ggplot2 is included in the Tidyverse, which you installed in SDS 100
  • Load ggplot2 in your environment.
library(ggplot2)

Anatomy of the ggplot() function

  • ggplot() takes two arguments:
    • data: the dataset used to produce the plot (in a data frame)
    • mapping: the variables from the dataset we want mapped onto visual cues
      • mappings are defined in a function called aes() (short for aesthetics)
      • in Cartesian plots, we must supply the variables/columns that will appear on the axes (via x = and y =)

Anatomy of the ggplot() function

ggplot(data = hampshire_census_data, 
       aes(x = COMMUNITY, 
           y = CEN_EARLYED))

Learning check: What’s the scale of the x-axis in the plot you just created? What’s the scale of the y-axis?

Where’s the data?

  • In previous plot, we told R what variables to plot, but we didn’t indicate how to plot them.
  • To do this, we need to add a geom function to our ggplot call. Examples:
    • Bar plots: geom_bar()
    • Scatterplots: geom_point()
  • Appended to function call with a + sign

Adding a geom function

ggplot(data = hampshire_census_data, 
       aes(x = COMMUNITY, 
           y = CEN_EARLYED)) +
  geom_col()

Learning check: What variables are mapped on to what visual cues in this plot?

Styling Plots: Flipping Coordinates

ggplot(data = hampshire_census_data, 
       aes(x = COMMUNITY, 
           y = CEN_EARLYED)) +
  geom_col() +
  coord_flip() # Flipping the x and y coordinates here makes the labels more legible.

Learning check: How’s the data-to-ink ratio on this plot?

Styling Plots: Changing the Theme

ggplot(data = hampshire_census_data, 
       aes(x = COMMUNITY, 
           y = CEN_EARLYED)) +
  geom_col() +
  coord_flip() + # Flipping the x and y coordinates here makes the labels more legible. 
  theme_minimal()

Learning check: What context needs to be added to this plot?

Styling Plots: Adding Labels

ggplot(data = hampshire_census_data, 
       aes(x = COMMUNITY, 
           y = CEN_EARLYED)) +
  geom_col() +
  coord_flip() + # Flipping the x and y coordinates here makes the labels more legible. 
  theme_minimal() +
  labs(title = "Hampshire County Early Education Enrollment Rates, 2018", 
       x = "Enrollment Rate for 3-4 yr old", 
       y = "Municipality in Hampshire County, MA")

Styling Plots: Adjusting the Scale

# Adjust the Scale
ggplot(data = pioneer_valley_census_data, 
       aes(x = COUNTY,y = CEN_WORKERS)) +
  geom_point() +
  coord_flip() +
  scale_y_log10() +
  labs(title = "Number of Workers Age 16+ in Pioneer Valley, MA Municipalities, 2018", x = "County", y = "Workers Age 16+")

Aeshetics vs. Attributes

  • We can adjust the way the data appears on plots in two ways:
    • According to a variable:
      • This must be done inside of the aes() function
    • In a fixed way:
      • This must be done outside of the aes() function

Adjusting Data on Plots via Aeshetics

We add visual cues to the plot in the aes() call

# Add visual cue for size
ggplot(data = pioneer_valley_census_data, 
       aes(x = COUNTY, y = CEN_WORKERS, size = CEN_HOUSEHOLDS)) +
  geom_point() +
  coord_flip() +
  labs(title = "Number of Workers Age 16+ in Pioneer Valley, MA Municipalities, 2018", x = "County", y = "Workers Age 16+", size = "Households")

Adjusting Data on Plots via Attributes

# Add visual cue for size and attribute for transparency
ggplot(data = pioneer_valley_census_data, 
       aes(x = COUNTY, y = CEN_WORKERS, size = CEN_HOUSEHOLDS)) +
  geom_point(alpha = 0.2) +
  coord_flip() +
  labs(title = "Number of Workers Age 16+ in Pioneer Valley, MA Municipalities, 2018", x = "County", y = "Workers Age 16+", size = "Households")

Do I really have to memorize all of these stylistic functions?!

No. There are cheatsheets. The ggplot2() cheatsheet is linked here.