Iteration

SDS 192: Introduction to Data Science

Lindsay Poirier
Statistical & Data Sciences, Smith College

Fall 2022

For Today

  • Quiz 2 Posted!
  • Project 2 Due Today
  • For Loops
  • Performing operations across() variables
  • Family of map functions

For Loops

for (i in df$var_a) {
  print(i + 1)
}
[1] 3
[1] 4
[1] 5
[1] 6

For Loops

for (i in df |> select(var_a:var_c)) {
  print(sum(i))
}
[1] 14
[1] 14
[1] 18

across()

  • Applies a function across multiple columns in a data frame
  • Takes as arguments: the columns to perform the function across and the function name
df |>
  summarize(across(var_a:var_c, sum))
df |>
  summarize(across(contains("var"), mean))
df |>
  mutate(across(where(is.numeric), as.character))

purrr package

  • Included in tidyverse
  • Package for working with functions and vectors
  • Provides a family of map() functions
  • map() functions allow us to apply a function to each element of a list or vector

Single Column Data Frames vs. Vectors

  • To extract a column from a data frame, we use pull()
df |> select(var_a)
df$var_a
[1] 2 3 4 5
df |> select(var_a) |> pull()
[1] 2 3 4 5

map()

  • Applies a function to each element in a vector
add_five <- function(x){
  x + 5
}

map(df$var_a, add_five)
[[1]]
[1] 7

[[2]]
[1] 8

[[3]]
[1] 9

[[4]]
[1] 10
add_five <- function(x){
  x + 5
}

a_vec <- df |> select(var_a) |> pull()
map(a_vec, add_five)
[[1]]
[1] 7

[[2]]
[1] 8

[[3]]
[1] 9

[[4]]
[1] 10

Setting Names

  • set_names() sets the names of elements in a vector
add_five <- function(x){
  x + 5
}

map(df$var_a, add_five) |>
  set_names(df$name)
$obs1
[1] 7

$obs2
[1] 8

$obs3
[1] 9

$obs1
[1] 10

Family of Map Functions

  • Returns a numeric vector
df |>
  select(var_a:var_c) |>
  map_int(is.numeric)
var_a var_b var_c 
    1     1     1 
  • Returns a character vector
df |>
  select(var_a:var_c) |>
  map_chr(is.numeric)
 var_a  var_b  var_c 
"TRUE" "TRUE" "TRUE" 
  • Returns a logical vector
df |>
  select(var_a:var_c) |>
  map_lgl(is.numeric)
var_a var_b var_c 
 TRUE  TRUE  TRUE 

Returning a Data Frame

  • Returns a list
create_total_col <- function(x){
  df |>
    filter(name == x) |>
    mutate(total = var_a + var_b + var_c)
    
}

map(unique(df$name), create_total_col)
[[1]]
  name var_a var_b var_c total
1 obs1     2     4     4    10
2 obs1     5     1     2     8

[[2]]
  name var_a var_b var_c total
1 obs2     3     7     9    19

[[3]]
  name var_a var_b var_c total
1 obs3     4     2     3     9
  • Returns a data frame (binding rows of list)
create_total_col <- function(x){
  df |>
    filter(name == x) |>
    mutate(total = var_a + var_b + var_c)
    
}

map_df(unique(df$name), create_total_col)

Iterating Over Multiple Vectors

add_two_vectors <- function(x, y){
  x + y
}

map2(df$var_a, df$var_b, add_two_vectors)
[[1]]
[1] 6

[[2]]
[1] 10

[[3]]
[1] 6

[[4]]
[1] 6