R Fundamentals

SDS 192: Introduction to Data Science

Lindsay Poirier
Statistical & Data Sciences, Smith College

Fall 2022

For Today

  • Nouns: Data objects in R
  • Verbs: R Functions
  • Conjunctions: R Operators
  • Missing Values and R Functions
  • Exercise

Things to Know Right Up Front

  • R is case-sensitive. df is different than DF

Data Objects in R

Values vs. Vectors vs. Data Frames

  • a single data point
  • R understands values to be of a certain type:
    • numeric: 3.29
    • integer: 3
    • character: “SDS 192”
    • logical: TRUE/FALSE
    • date-time: 3/12/92 01:23:01
  • a 1-dimensional data object, listing a series of values
  • all objects in a vector share the same type
  • vector defined by listing entries (separated by commas) in the function c() (shorthand for combine)
vector_example <- c(1, 5, 6, 7)
vector_example
[1] 1 5 6 7
  • a two-dimensional (rectangular) data object
  • Every column in a data frame is a vector
  • Column names act as a variable name for that vector (access via the $ accessor)
  • I (Lindsay) use df to denote a data frame.
df
  col1  col2 col3
1    1  TRUE    a
2    5 FALSE    b
3    6  TRUE    c
4    7  TRUE    d
df$col1
[1] 1 5 6 7

Assigning Objects to Variable Names

  • <- symbol assigns a value to a variable

  • Variable names should be descriptive! Poor or confusing variables names include:

    • a anddata1: Be descriptive!

    • student.test.scores: Avoid periods!

    • student test scores: Use separator characters!

    • 3rd_test: Variables can’t start with numbers!

  • This course: snake case (lower case with words separated by underscores)

Learning check

What kind of object is this in R? What is its type?

temps <- c(47.3, 55.6, 48.3)

Learning check

What would happen if I were to do the following in R?

val <- 34
val <- val + 1
  • This is called overwriting a variable.

Where can I find these data objects in R?

  • Objects in R will be listed in the Environment tab in the upper right hand corner of RStudio.

  • Removing unnecessary objects from the environment can free up space!

rm(vector_example)

Functions in R

What is a function?

  • Think of functions like imperative sentences (e.g. “go”, “stay”, or “sleep”)
  • Indicate that you want it to take an action
  • Typically immediately followed by open and closed parentheses
  • What were some functions referenced this week’s reading?

Arguments

  • Imagine I requested someone to “close” or “bring”
    • They’re next questions might be “close what” or “bring what”, and I might say back “close the door” or “bring dessert
  • Specify the subject of the function, along with additional information needed to run the function
  • Listed inside of the parentheses
  • Some arguments are required. Others are optional.

Finding Help

  • Typing ?FUNCTION_NAME in to the Console loads info about that function

?round()

  • What functions are required?
  • What functions are optional?

Learning check

Convert the following variable name into something descriptive in snake case

a <- round(pi, digits = 2)

Run the code in your Console. How can we find this variable in RStudio once we run this code?

Helpful Functions in R

Helpful Value Operations

R can work just like a calculator!

a <- 2
b <- 3

sum(a,b)
[1] 5

Why does this produce an error?

c <- "3"
sum(c, c)
Error in sum(c, c): invalid 'type' (character) of argument

R can concatenate strings!

word1 <- "Harry"
word2 <- "Sally"
paste("When", word1, "Met", word2, sep = " ")
[1] "When Harry Met Sally"

Helpful Vector Functions

  • class() returns the class of the values in a vector
  • length() returns the number of values in a vector
  • is.na() for each value, returns whether the value is an NA value
  • sum() returns the sum of the values in a vector
  • max() returns the maximum value in a vector
  • rank() returns the ranking of a value in a vector
  • unique() returns the unique values of a vector

Learning Check

How would I find the sum of the third column in this data frame, which I have named df?

  col1 col2 col3
1    1    2    3
2    5    4    6
3    7    6    9

Helpful Data Frame Functions

  • View(): Opens a tab to view the data frame as a table
  • head(): returns first six rows of dataset
  • names(): returns the dataset’s column names
  • nrow(): returns the number of rows in the dataset
  • ncol(): returns the number of columns in the dataset

Do I really have to memorize all of these functions?!

Operators in R

Operators in R

  • Symbols that communicate what operations to perform in R
  • Includes calculator symbols: +, -, *, /, ^
  • Includes relational symbols: <, <=, <, <=, ==, !=
  • Includes logical symbols: & (AND), | (OR), ! (NOT)

Pipe Operator in R

  • Symbol is |> (old version is %>%)

    Without Pipe

    • Functions are nested as arguments in R

    • length(unique(df$col1))

    • Perform the innermost function to the outermost

    With Pipe

    • Functions are sequenced in R

    • df$col1 |> unique() |> length()

    • Take this data object, and then perform this function, and then perform this function

Missing Values and R Functions

Missing Values

  • Remember that missing values still have a position in rectangular datasets
  • Missing values get recorded as NA in R
  • …but sometimes analysts put words or numbers in their datasets to indicate missingness:
    • “NONE”
    • -999
    • “” <- this is the most challenging to uncover!
  • …but what happens when we try to perform functions on vectors that contain missing values?

Missing Values in Math Functions

We can use na.rm = TRUE to ignore NA values in math functions.

vals <- c(1, 2, NA, 4, NA, 6)
sum(vals)
[1] NA
sum(vals, na.rm = TRUE)
[1] 13