11 Control Flow

11.1 If/Else statements

11.1.1 Traditional

Sometimes, you will want your code to perform different actions depending on something’s value. In these circumstances, it is useful to implement if...else statements. if statements work as such:

if some test/condition evaluates to TRUE, execute some specific code.

Other tests/conditions can be appended with else statements, which specify what to do when the original test/condition evaluates to FALSE and is based on subsequent tests/conditions. The syntax of a basic if statement is as follows:

if(some test/condition to evaluate) {
  code for what to do if that test/condition evaluates to true
}

For example:

x = 4

if(3 < x) {
  print("The condition evaluated TRUE")
}
#> [1] "The condition evaluated TRUE"

x was set equal to 4. R evaluates the test 3 < x in the if statement, here equivalent to 3 < 4, which evaluates to TRUE. Since the if condition evaluates TRUE, it runs the code in the curly brackets.

x may not always be 4 though. What if x was not 4? What if you do not know what x is? If the test in the if statement evaluates to FALSE, nothing happens. If nothing at all happens, you may not know if there was an error in the code or the test just evaluated to FALSE. Considering this, it is always good practice to set up an alternative for when the test evaluates to FALSE. That is where else statements come in.

The code below will change x to be a random value, so its actual value will be unknown.

x = sample(c(1:6), 1) # From the values 1 to 6, sample 1 value

if(3 < x) {
  print("The condition evaluated TRUE")
} else {
  print("the condition evaluated FALSE")
}
#> [1] "the condition evaluated FALSE"
x
#> [1] 2

This code specifies what to do depending on whether the test evaluates to TRUE or if it evaluates to FALSE.

You can get more specific and link several conditions together. You may not want just 2 options – e.g., something to do if a test is TRUE and something else done in all other cases. Instead of using else, you use else if() and specify another test.

x = sample(c(1:6), 1)

if(4 < x) {
  print("The first condition evaluated TRUE")
} else if (2 < x & x < 5) { 
  print("The second condition evaluated TRUE")
} else {
  print("Neither the first or second condition evaluated TRUE.")
}
#> [1] "Neither the first or second condition evaluated TRUE."
x
#> [1] 2

Writing if statements like this is most useful when the code you want to run executes a function. As you will note above, in all instances the print() function was the code being executed (what is in the curly brackets). An infinite number of if conditions can be chained together in an if...else chain.

There are two alternative ways to write if...else statements.

11.1.2 ifelse()

ifelse() is most useful when you need to return values rather than execute some other code/function (like printing a character string).

ifelse() statements take the form:

ifelse(test, the value to return if the test evaluates *TRUE*, the value to return if the test evaluates *FALSE*)

Multiple ifelse() statements can be chained together, akin to an else if by adding a nested ifelse() call in place of the FALSE argument.

The examples below demonstrates this:

x = sample(c(1:6), 1)

ifelse(4 < x, "The first condition evaluated true.",
       ifelse(2 < x & x < 5, "The second condition evaluated true.", 
              "Neither the first or second condition evaluated true."))
#> [1] "The second condition evaluated true."
x
#> [1] 4

11.1.3 case_when()

case_when() is just a different way to formulate ifelse() strings, and is most useful when you have many nested tests/conditions to specify.

x = sample(c(1:6), 1)

case_when(
  x < 4 ~ "The first condition evaluated true.",
  2 < x & x < 5 ~ "The second condition evaluated true.",
  TRUE ~ "Neither the first or second condition evaluated true."
)
#> [1] "The first condition evaluated true."
x
#> [1] 3

In sum:

  • Traditional if...else statements are useful when you need the result to execute some code.
  • ifelse() and case_when() are useful when you need the result to be a specific value and are often used to create new data or variables.

11.2 Loops

Loops are used to repeat certain code iteratively, for example when you want to apply the same code to each element in a sequence (e.g., columns in a dataframe, elements in a vector, etc). The basic syntax of a for loop is as follows:

for (val in sequence) 
  {code to be executed}

The for initiates the for loop, val is completely arbitrary and can be replaced with any character string. Conventionally it is just the letter i, and subsequently j then k if you are doing nested for loops (loops within loops).

For a simple use case, imagine the following scenario:

You are a UGIA and are helping the professor with an exam. You have a series of exam scores c(1:10). The professor was feeling generous and wants to curve the scores by 1 point. It would be pretty annoying to have to try and manually change each value. Instead, you can do this automatically with a for loop!

x = c(1:10) # Exam scores

for (i in 1:length(x)) {
  x[i] = x[i] + 1 # Set the ith X to be equal to itself + 1
  # This will be iterated through each value in x
}

x # Look at output to verify changes
#>  [1]  2  3  4  5  6  7  8  9 10 11

Breaking down the code above step by step: First a for loop was initiated, saying you wanted to iterate over each element in the sequence 1 to length(x). The length() function returns the number of elements in the object you pass it. Then it was specified that the ith element of x should be replaced with the value resulting from the sum of that value + 1 (x[i] + 1). The value of i will change in each iteration of the loop. It starts with 1 (because that is what the code tells it to do with the 1: part), and increments by 1 each iteration, iterating length(x) times.

1:length(x) was used instead of just 10 (the number of elements in the vector x) above to keep the code dynamic. This illustrates an important coding principle: soft coding vs hard coding. Hard coding is static and unchanging, whereas soft coding is dynamic. What does this mean? Well, x may not always have 10 exam scores. Maybe you have some students who take their exams with OSD, and you have to wait a few days to get their exams back. You want to be able to run the same code without making any modifications. If 1:10 is used in the for loop, then when the new exam scores are added to x, the code won’t run on all exams! The for loop is specificed to explicitly iterate over the range 1:10. However, by using 1:length(x), length(x) will always be replaced by the exact number of elements in the vector x! This way, the same code can be used no matter how many exam scores you have! Generally speaking, you always want to soft code and make your code dynamic.

for loops are often combined with if statements to apply conditional code iteratively through your data.

Artwork by Horst (2022)

Imagine that instead of needing to add a bonus point to every exam, you need to give particular students a bonus if they completed a SONA experiment for extra credit.

This can be accomplished by adding an if statement to the code executed executed in each iteration:

y = data.frame("Exam" = c(1:4), 
               "Score" = c(88,90,77,98), 
               "Student" = c("Dave", "Ally", 
                             "Tyreek", "Jeanie"), 
               "Sona" = c(0,1,1,0))

y
#>   Exam Score Student Sona
#> 1    1    88    Dave    0
#> 2    2    90    Ally    1
#> 3    3    77  Tyreek    1
#> 4    4    98  Jeanie    0
for (i in 1:nrow(y)) { 
    # Use nrow for a dataframe
  if(y$Sona[i] == 1){ 
    # $ to index -- You want the y dataframe, 
    # the Student column, and the ith row. 
    y$Score[i] = y$Score[i] + 5 
          # For every row in the Score column of 
          # the y dataframe, if the condition y$Sona[i] == 1 
          # evaluates to TRUE, that value is going to be 
          # equal to what is currently there + 5.
  }
}

The same task can be accomplished using ifelse(), since the goal here is to return values:

y = data.frame("Exam" = c(1:4), 
               "Score" = c(88,90,77,98), 
               "Student" = c("Dave", "Ally", 
                             "Tyreek", "Jeanie"), 
               "Sona" = c(0,1,1,0))

for (i in 1:nrow(y)) {
  y$Score[i] = ifelse(y$Sona[i] == 1, # Test
                      y$Score[i]+5, # What to do if TRUE
                      y$Score[i]) # What to do if FALSE
}

y
#>   Exam Score Student Sona
#> 1    1    88    Dave    0
#> 2    2    95    Ally    1
#> 3    3    82  Tyreek    1
#> 4    4    98  Jeanie    0

You can see that only Ally and Tyreek’s scores, the students who completed the SONA extra credit, have changed.