SDS 192: Introduction to Data Science
Lindsay Poirier
Statistical & Data Sciences, Smith College
Fall 2022
mutate
to overwrite a variable with a new cleaned up variable.as.character()
, as.numeric()
, as.logical()
all convert a variable from an original type to a new typelubridate
package
lubridate
cheatsheetymd_hms()
will take a date formatted as year, month, day, hour, minute, second and convert it to a date time formatNA
valuesna_if()
will take a variable and set specified values to NA
str_replace()
will take a variable and replace an existing string with a new stringstr_replace()
will take a variable and replace an existing string with a new stringcase_when()
allows us to set values when conditions are metWhat variables are displayed on this plot?
City
column on the previous slide?pivot_longer()
to pivot a datasets from wider to longer format:pivot_longer()
takes the following arguments:cols =
: Identify a series of columns to pivot - The names of those columns will become repeated rows in the pivoted data frame, and the values in those columns will be stored in a new column.names_to =
: Identify a name for the column where the column names will be storevalues_to =
: Identify a name for the column were the values associated with those names will be storedNote: I use this far less often than
pivot_longer()
pivot_wider()
to pivot a datasets from longer to wider format:pivot_wider()
takes the following arguments:names_from =
: Identify the column to get the new column names fromvalues_from =
: Identify the column to get the cell values fromseparate()
to split a column into multiple columns:separate()
takes the following arguments:col
: Identify the existing column to separateinto = c()
: Identify the names of the new columnssep =
: Identify the characters or numeric position that indicate where to separate columnsAQI
on the previous slide into a numeric variable?