SDS 192: Introduction to Data Science
Lindsay Poirier
   Statistical & Data Sciences, Smith College
Fall 2022
 
mutate to overwrite a variable with a new cleaned up variable.as.character(), as.numeric(), as.logical() all convert a variable from an original type to a new typelubridate package
lubridate cheatsheet

ymd_hms() will take a date formatted as year, month, day, hour, minute, second and convert it to a date time formatNA valuesna_if() will take a variable and set specified values to NAstr_replace() will take a variable and replace an existing string with a new stringstr_replace() will take a variable and replace an existing string with a new stringcase_when() allows us to set values when conditions are metWhat variables are displayed on this plot?
City column on the previous slide?pivot_longer() to pivot a datasets from wider to longer format:pivot_longer() takes the following arguments:cols =: Identify a series of columns to pivot - The names of those columns will become repeated rows in the pivoted data frame, and the values in those columns will be stored in a new column.names_to =: Identify a name for the column where the column names will be storevalues_to =: Identify a name for the column were the values associated with those names will be storedNote: I use this far less often than
pivot_longer()
pivot_wider() to pivot a datasets from longer to wider format:pivot_wider() takes the following arguments:names_from =: Identify the column to get the new column names fromvalues_from =: Identify the column to get the cell values fromseparate() to split a column into multiple columns:separate() takes the following arguments:col: Identify the existing column to separateinto = c(): Identify the names of the new columnssep =: Identify the characters or numeric position that indicate where to separate columnsAQI on the previous slide into a numeric variable?