library(readr)
nyc_recent_noise <- read_csv("https://data.cityofnewyork.us/resource/erm2-nwe9.csv?complaint_type=Noise%20%2D%20Commercial&$limit=200")
head(nyc_recent_noise)
SDS 192: Introduction to Data Science
Lindsay Poirier
Statistical & Data Sciences, Smith College
Fall 2022
200
: Success!403
: Forbidden404
: Not Found500
: Internal Server Error502
: Bad GatewayFigure: REST API - Author: Seobility - License: CC BY-SA 4.0]
Base URL is the API Endpoint:https://data.cityofnewyork.us/resource/erm2-nwe9.csv
https://data.cityofnewyork.us/resource/erm2-nwe9.csv
?
&
$limit=
limits the number of rows downloaded to a certain numberhttps://data.cityofnewyork.us/resource/erm2-nwe9.csv?unique_key=10693408
https://data.cityofnewyork.us/resource/erm2-nwe9.json?complaint_type=Obstruction&$limit=100
https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9
Internet protocols don’t know how to interpret spaces or other special characters (i.e. non-ASCII), so we replace those characters with special codes that they do recognize:
: %20!
: %21"
: %22%
: %25'
: %27-
: %2DThere are many resources online for identifying these.
R
read_csv()
dplyr
vs. SQL
select()
filter()
group_by()
arrange()
head()
SELECT
WHERE
GROUP BY
ORDER BY
LIMIT
SQL
in APIsSQL
can be written in the URLs constructed for API callsSoQL
SELECT unique_key, created_date, incident_address
WHERE descriptor = 'Pothole'
LIMIT 100
WHERE complaint_type = 'Traffic'
SELECT descriptor, count(*)
GROUP BY descriptor
ORDER BY count DESC