Lab 9: Mapping with Leaflet

Author

Lindsay Poirier

Introduction

In this lab, we will build a map that visualizes unaddressed housing violations in NYC in order to identify some of the city’s worst landlords. In doing so, we will gain practice in producing point maps in Leaflet. This analysis is based off of a similar analysis conducted by the NYC Public Advocate’s Office.

Learning Goals

  • Transform point data to an appropriate CRS
  • Map point data in Leaflet
  • Create palettes for points on maps
  • Add legends and labels to a map

Review of Key Terms

Tip

You may wish to reference this reference guide when completing this lab.

Coordinate Reference System (CRS)

a system used to locate geographic points on different spatial projections

NYC Public Advocate’s Worst Landlords Watchlist

The NYC Public Advocate (currently Jumaane D. Willaims) publishes a list of NYC’s worst landlords every year by tracking the number of class B and C Housing Preservation and Development (HPD) violations in buildings owned by various people and companies in the City.

From the Public Advocate’s website:

Examples of Class B violations include: failing to provide self-closing public doors or adequate lighting in public areas, lack of posted Certificates of Occupancy, or failure to remove vermin. Class C violations include: immediately hazardous violations such as rodents, lead-based paint, and lack of heat, hot water, electricity, or gas.1 1See more here.

I have already constructed a dataset documenting the 100 rental properties with the most average housing violations approved in 2021. If you want to check out the code to construct that dataset, you can review the code in data-import.R in this directory. Run the code below to import the dataset.

Setting Up Your Environment

  1. Install the following packages in your Environment:
  • install.packages("leaflet")
  • install.packages("sf")
  1. Run the code below to load today’s data frames into your environment.
library(tidyverse)
library(leaflet)
library(RColorBrewer)
library(sf)

worst_buildings <- read_csv("https://raw.githubusercontent.com/SDS-192-Intro/sds-192-labs/main/Day30-Leaflet/datasets/worst_buildings.csv")

Mapping in Leaflet

We are going to work with the leaflet() package to design a map of worst 100 NYC rental properties in terms of housing violations. The map will color the points by the average number of violations and size the points by the number of units in the building. At any point during these exercises, you can reference the leaflet documentation to help you build out these maps.

Initialize your Map

There are three steps to initializing a map in leaflet:

  1. Call the leaflet() function to create a map widget.
  2. Call the setView(lat = 0.00, lng = 0.00, zoom = 0). This function determines where the map will initially focus. We provide a set of coordinates that will be the map’s center point when we load it, and a number (from 1 to 16) to indicate the level at which to zoom in on the map.
  3. Call addProviderTiles() to load up provider tiles that constitute a base map. A number of different map providers have provider tiles that we can reference here. A few examples of the arguments we can supply to this function include include:
  • providers$OpenStreetMap
  • providers$Stamen.Toner
  • providers$CartoDB.Positron
  • providers$Esri.NatGeoWorldMap

Run the code below to initialize a map.

map1 <- leaflet(width = "100%") %>%
  setView(lat = 0.00, lng = 0.00, zoom = 0) %>%
  addProviderTiles(providers$Stamen.Toner)

map1
Question

Adjust the code below to center the map on NYC. You’ll need to look up the coordinates for NYC, keeping in mind that South and West coordinates will be negative. Adjust the zoom to keep the whole city in view (setting the zoom level between 1 and 16). When you are happy with the View, switch out the provider tiles to a base map that won’t distract from the data points we will layer on top of this map. Keep in mind our discussions regarding Visualization Aesthetics here.

nyc_map <- leaflet(width = "100%") %>%
  setView(lat = 0.00, lng = 0.00, zoom = 0) %>%
  addProviderTiles(providers$Stamen.Toner)

# Uncomment below to view the map!
# nyc_map

Set Geometry and Transform Data CRS

We will be looking to convert our data into an object with coordinate-based geometry so that we can map it. We can use the st_as_sf() function to do this. st_as_sf() takes two arguments:

  1. The names of the columns in our data frame containing geographic coodinates in the format coords = c("<longitude column names>", "<latitude column name>")
  2. The coordinate reference system that the dataset currently uses in the format crs = <crs number>.

After adding this geometry column with st_as_sf(), we need to make sure that the CRS for our coordinates is consistent with the CRS of the basemap we will be placing the points on. The coordinates in worst_buildings data are in NAD 83 (EPSG:4269) and our basemap is in WGS 84 (EPSG:4326).

Question

In the code block below, create a new geometry column with the given coordinates, using the function st_as_sf(). The column for longitude will go in the first coordinate position, and latitude will go in the second. Be sure to set the data’s current CRS (4269) in that function. Then transform the CRS to 4326 using st_transform().

# Uncomment and fill in the blanks below. The column for longitude will go in the first coordinate position, and latitude will go in the second. 

#worst_buildings <- worst_buildings %>%
#  st_as_sf(coords = c("____", "____"), crs = ____) %>%
#  st_transform(____)

Add Layers

Once we have our initial map, we can add layers to the map that display different forms of geospatial data. There are a number of different functions in leaflet that we can use to add layers. For instance, we can add markers to the map at a certain geo-coordinates using the addMarkers() function. We can also add polygons and rectangles to the map using the addRectangles() function or the addPolygons() function. Today we are going to work exclusively with the addCircleMarkers() function. This allows us to add a circle at the latitude and longitude for each row in our dataset. It also allows us to adjust the circle’s color and size according to values in our dataset.

Question

In the code block below, pipe addCircleMarkers() onto the basemap and list data = worst_buildings as an argument in that function. It should look like my map below.

# Uncomment and add the function

#nyc_map %>%

Map isn’t so legible/beautiful at this point, right?

Styling the Map

Question

Map isn’t so legible/beautiful at this point, right? Copy the map you just created into the code chunk below, and pipe the addCircleMarkers() function onto your code. Check out the help pages for the addCircleMarkers() function, and add some arguments to help with the map’s legibility. At the very least, you should adjust the radius, weight, color, fillColor, and fillOpacity, and understand how each of these arguments will change the style of the map. For now you can set the color to “black” and the fillColor to “white”. See if you can get your map to match mine below.

# Copy previous map here! 

Creating Color Palettes

Now that our map is looking more legible, let’s color the circles by the 2021 average open violations at each property on the map. Remembering back to our lesson on Understanding Datasets and Visualization Aesthetics, we should keep in mind that avg_violations_weighted is a numeric variable, and therefore, we will create a sequential color palette to map it.

There are three functions in leaflet for creating a sequential color palette:

  1. colorNumeric(): Creates a palette by assigning numbers to different colors on a spectrum
  2. colorBin(): Creates a palette by grouping numbers into a specified number of equally-spaced intervals (e.g. 0-10, >10-20, >20-30)
  3. colorQuantile(): Creates a palette by grouping numbers into a specified number of equally-sized quantiles

Note that we would use colorFactor() to create a categorical palette.

Each function takes both of the following arguments:

  • palette: the colors we wish to map to the data. We will use preset palettes from the RColorBrewer. You can call display.brewer.all() to see the list of palettes or reference here: http://applied-r.com/rcolorbrewer-palettes/
  • domain: the values we wish to apply the palette to. Here we reference the column from the data frame we wish to color the points by using the accessor $.

In addition, the colorBin() function takes the argument bins, which indicates the number of color intervals to be created. The colorQuantile() functions takes the argument n, which indicates the number of quantiles to map data into.

Question

Create three palettes below (one using each function colorNumeric(), colorBin(), and colorQuantile()), setting the palette to “YlOrRd”, and the domain to worst_buildings$avg_violations_weighted. For colorBin(), set the bins to 4, and for colorQuantile(), set n to 4.

pal_num <- colorNumeric(palette="YlOrRd", 
                        domain = worst_buildings$avg_violations_weighted)
#Uncomment below to create the other two palettes.

#pal_bin <- 

#pal_quant <- 
Question

Copy and paste the last map you created into the code chunk below, three times. For each, we are going to adjust the fillColor by setting it to the variable from the dataset that we wish to color (avg_violations_weighted), colored by one of the palettes that we created. We can do this by setting the fill color argument equal to:

  • ~pal_num(avg_violations_weighted): for coloring the points according the pal_num() palette we created on the avg_violations_weighted variable
  • ~pal_bin(avg_violations_weighted): for coloring the points according the pal_bin() palette we created on the avg_violations_weighted variable
  • ~pal_quant(avg_violations_weighted): for coloring the points according the pal_quant palette we created on the avg_violations_weighted() variable

Your maps should look like mine below.

# Copy map first time here!

# Copy map second time here!

# Copy map third time here!
Question

In a comment below, explain the following: Why do the colors appear differently on each map? Which map best represents the distribution of values in avg_violations_weighted?

# Add comment here. 

Sizing the Circles

Question

Copy and paste the code from above that colors avg_violations_weighted in bins into the code chunk below. We are going to size each circle by the number of rental units in that property. Number of units is stored in the column legalclassa in our dataset, so we could size the circles by setting the radius to ~legalclassa. However, this would lead to some massive circles on our map as certain buildings have hundreds of units, and the value we supply to radius will determine the pixels of the circle on our map. To deal with this, we are going to take the square root of the units in each property using the sqrt() function. Set the radius to ~sqrt(legalclassa) below.

# Copy map colored by bins here!

Legends and Labels

Question

As a final step, we are going to add labels and legends to the map. Copy and paste the code from the previous step into the code chunk below. Then do the following:

  1. Add the label argument to the addCircleMarkers() function, and set the value to the column in our dataset that indicates the landlord’s full name: ~fullname. Run the code and check what happens when we hover over the circles.
  2. Add a pipe to the end of the addCircleMarkers() and then add the function addLegend(). Consult the help pages for the addLegend() function to determine how to add a legend for the meaning of the colors represented on the map. At the very least, you will need an argument for data, pal, and values.
# Copy previous map here!

Ethical Considerations

The NYC Worst Landlord Watchlist determines worst landlords based on class B and C Housing Preservation and Development (HPD) violations. …but not every issue a tenant faces with respect to their landlord will be documented via an HPD violation. What kinds of issues might go ignored when we use these variables alone to determine worst landlords? How should we as data scientists communicate what this data shows and doesn’t show? Share your ideas on our sds-192-discussions Slack channel.