Show code
library(tidyverse)
library(knitr)
library(rlang)
Yang Wu
January 4, 2020
The following packages are needed:
Loops are an important programming concept, enabling programmers to execute blocks of code repeatedly, usually with varying options. This post will cover three types of loops— for, while, and repeat. We will then solve some problems using loops to demonstrate the power of iteration in R programming. Whenever possible, we will attempt to solve problems using different methods, including different types of loops and parallel processing. Many of R’s functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. We will leverage this unique feature of R to show that many problems that seem to involve loops can actually be solved differently in R, although the programs may be harder to intuit.
For more readings on control flows in R, I suggest starting with Hadley Wickham’s Advance R and Introduction to Scientific Programming and Simulation Using R.
Basic syntax:
For each item in vector, perform_action is called once; updating the value of item each time. There are two ways to terminate a for loop early:
[1] 3
[1] 4
[1] 5
[1] 6
seq_along(x)
to generate the sequence in for()
since it always returns a value the same length as x, even when x is a length zero vector:[[
to work around this caveat:[1] "2020-01-11"
[1] "2010-01-11"
Basic syntax:
When a while command is executed, logical_expression is evaluated first. If it is true, then the group expressions in {} is executed. Control is then passed back to the start of the command: if logical_expression is still TRUE then the grouped expressions are executed again, and so on. For the loop to stop, logical_expression must eventually be FALSE. To achieve this, logical_expression usually depends on a variable that is altered within the grouped expressions.
Basic syntax:
It is a simple loop that will run the same statement or a group of statements repeatedly until the stop condition has been encountered. Repeat loop does not have any condition to terminate the loop, a programmer must specifically place a condition within the loop’s body and use the declaration of a break statement to terminate this loop. If no condition is present in the body of the repeat loop then it will iterate infinitely.
ceiling()
takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of x. On the other hand, floor()
takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.1.01 kB
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114 115 116 117 118 119 120
1.02 kB
# Time
time <- (0:n) * period
# Plot
ggplot(data = tibble(time, pension), mapping = aes(x = time, y = pension)) +
geom_point(color = "orange") +
labs(
title = "Forecast of Pension Value",
x = "Time (years)", y = "Pension Value ($)"
) +
theme(
panel.background = element_rect(fill = "grey97"),
panel.grid = element_blank()
)
Fixed-payment loan will be repaid in 13.25 years
Consider the function \(y=f(x)\) defined by
When \(x\leq 0\), \(f(x)=-x^{3}\)
When \(x\in(0,1]\), \(f(x)=x^{2\)
When \(x>1\), \(f(x)=\sqrt{x}\)
# Define x
x_vals <- seq.int(from = -2, to = 2, by = 0.1)
# Initialize sequence
seq <- seq_along(x_vals)
# Pre-allocate container for y values
y_vals <- vector(mode = "double", length = length(x_vals))
# For loop
for (i in seq) {
# Set x values
x <- x_vals[[i]]
if (x <= 0) {
y <- -x^3
} else if (x > 0 & x <= 1) {
y <- x^2
} else if (x > 1) {
y <- sqrt(x)
}
# Compute y values and store in the container vector
y_vals[[i]] <- y
}
# Plot the function
ggplot(data = tibble(x_vals, y_vals)) +
geom_line(mapping = aes(x = x_vals, y = y_vals), color = "blue") +
labs(
title = "Piecewise Function",
x = "x", y = "y"
) +
theme(
panel.background = element_rect(fill = "grey97"),
panel.grid = element_blank()
)
case_when()
(Note that the function is \(-x^3\) when \(x \leq 0\); hence the negative sign in front of x) [1] 8.000000 6.859000 5.832000 4.913000 4.096000 3.375000 2.744000 2.197000
[9] 1.728000 1.331000 1.000000 0.729000 0.512000 0.343000 0.216000 0.125000
[17] 0.064000 0.027000 0.008000 0.001000 0.000000 0.010000 0.040000 0.090000
[25] 0.160000 0.250000 0.360000 0.490000 0.640000 0.810000 1.000000 1.048809
[33] 1.095445 1.140175 1.183216 1.224745 1.264911 1.303840 1.341641 1.378405
[41] 1.414214
[1] 1.428571
[1] 4243336
[1] 9
[1] 1.428571
[1] 4243336
[1] 47
# Function
sum_of_sequence_vectorized <- function(x, n) {
# Create vector of x
vector_of_x <- rep(x = x, times = n + 1)
# Create vector of exponents
vector_of_exponents <- seq.int(from = 0, to = n, by = 1)
# Create vector of terms in the sequence
vector_of_terms <- vector_of_x^vector_of_exponents
# Find the sum
sum(vector_of_terms)
}
# Test
sum_of_sequence_vectorized(x = 0.3, n = 55)
[1] 1.428571
[1] 4243336
[1] 47
The geometric mean of a vector is defined as follows:
\[\begin{align*} \left(\prod_{i=1}^{n} x_{i}\right)^{\frac{1}{n}}=\sqrt[n]{x_{1} x_{2} \cdots x_{n}} \end{align*}\]
geometric_for_loop <- function(x) {
# Length of vector
n <- length(x)
# Warning
if (is.numeric(x) == FALSE) {
abort("Vector is of the wrong type; input must be numeric")
} else if (n < 2) {
abort("Input vector must contain more than 1 element")
}
# Initialize first term (as.double() ensures no integer overflow)
x_val <- as.double(x[[1]])
# Iterate over the sequence 1:(n - 1)
# The algorithm involves multiplying the current element i by the next (i + 1) element in x
# Setting (n - 1) as the last item safeguards against out-of-bounds subsetting of "x"
seq <- 1:(n - 1)
# Iterate
for (i in seq) {
x_val <- x_val * x[[i + 1]]
}
# Geometric mean
(x_val)^(1 / n)
}
# Test
# Create a random vector
x <- sample(x = 1:45, size = 200, replace = TRUE)
# A function from the psych package
psych::geometric.mean(x)
[1] 18.64757
[1] 18.64757
geometric_vectorization <- function(x) {
# Length of vector
n <- length(x)
# Warning
if (is.numeric(x) == FALSE) {
abort("Vector is of the wrong type; input must be numeric")
} else if (n < 2) {
abort("Input vector must contain more than 1 element")
}
# Product of vector elements
# The function prod() is primitive
prod <- prod(x)
# Geometric mean
prod^(1 / n)
}
# Test
geometric_vectorization(x)
[1] 18.64757
harmonic_for_loop <- function(x) {
# Length of vector
n <- length(x)
# Warning
if (is.numeric(x) == FALSE) {
abort("Vector is of the wrong type; input must be numeric")
} else if (n < 2) {
abort("Input vector must contain more than 1 element")
}
# Initialize x value
x_val <- as.double(1 / x[[1]])
# Create sequence
seq <- 1:(n - 1)
# Iterate
for (i in seq) {
x_val <- x_val + (1 / x[[i + 1]])
}
# Harmonic mean
n / x_val
}
# Test
# A function from the psych package
psych::harmonic.mean(x)
[1] 12.20518
[1] 12.20518
harmonic_vectorization <- function(x) {
# Length of vector
n <- length(x)
# Warning
if (is.numeric(x) == FALSE) {
abort("Vector is of the wrong type; input must be numeric")
} else if (n < 2) {
abort("Input vector must contain more than 1 element")
}
# Find element-wise reciprocals
x_reciprical <- 1 / x
# Sum the reciprocals
sum <- sum(x_reciprical)
# Harmonic mean
n / sum
}
# Test
harmonic_vectorization(x)
[1] 12.20518
# Function
every_nth_element_for_loop <- function(x, n) {
# Define the nth term
n <- n
# Initialize sequence
seq <- seq_along(x)
# Initialize counter
counter <- 0
# Pre-allocate container
new_x <- vector(mode = "double", length = length(x))
# Loop
for (i in seq) {
# Count the term
counter <- counter + 1
# If counter gets to n, copy that term to the container
if (counter == n) {
new_x[[i]] <- x[[i]]
# Reinitialize counter to zero
counter <- 0
}
}
# Sum
new_x
}
# Test vector
x <- sample(x = 1:203, size = 100, replace = TRUE)
x
[1] 171 146 184 200 99 8 179 170 50 9 36 202 76 131 174 22 3 162
[19] 196 164 148 141 136 144 121 156 47 88 8 169 174 197 74 100 122 18
[37] 48 14 92 13 115 95 150 122 134 140 165 27 115 139 143 147 164 196
[55] 95 132 132 21 90 49 57 119 115 73 132 163 180 176 98 155 82 141
[73] 170 89 27 39 178 53 64 74 148 128 173 88 165 40 77 41 163 100
[91] 130 140 104 122 190 72 92 35 26 31
[1] 0 0 0 0 0 0 0 0 0 0 0 0 76 0 0 0 0 0
[19] 0 0 0 0 0 0 0 156 0 0 0 0 0 0 0 0 0 0
[37] 0 0 92 0 0 0 0 0 0 0 0 0 0 0 0 147 0 0
[55] 0 0 0 0 0 0 0 0 0 0 132 0 0 0 0 0 0 0
[73] 0 0 0 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0
[91] 130 0 0 0 0 0 0 0 0 0
[1] 786
# Function
every_nth_element_while_loop <- function(x, n) {
# Length of vector
length <- length(x)
# Initial value
value <- 0
# Initialize counter
counter <- n
# Loop
# Use modulo to ensure that, whenver the counter gets to the nth element, the logical evaluates to true
while (counter %% n == 0) {
# Extract the element from x using the index "counter"
# This counter is every nth element in the vector or the logical above wouldn't have evaluated to true
# Alter the value by add the nth term
value <- value + x[[counter]]
# Increase the counter by n
# Now the logical above will again evaluate to true
counter <- counter + n
# Exit condition
if (counter > length) {
break
}
}
# Sum
value
}
# Test (This result should corroborate with that of the function above)
every_nth_element_while_loop(x = x, n = 13)
[1] 786
seq()
[1] 786
Charting the flow of the following program is a good way to see how for loops work in R. We will write out the program line by line so as to understand what it is doing exactly.
[1] 3
[1] 3 10
[1] 3 10 5
[1] 3 10 5 16
We suppose that \(x(t)\) is the number of prey animals at the start of a year \(t\) (rabbits) and \(y(t)\) is the number of predators (foxes), then the Lotka-Volterra model is: \[\begin{align*} x(t+1) &=x(t)+b_{r} \cdot x(t)-d_{r} \cdot x(t) \cdot y(t) \\ y(t+1) &=y(t)+b_{f} \cdot d_{r} \cdot x(t) \cdot y(t)-d_{f} \cdot y(t) \end{align*}\]
where the parameters are defined by:
# Growth rate of rabbits
br <- 0.04
# Death rate of rabbits due to predation
dr <- 0.0005
# Death rate of foxes in the absence of of food
df <- 0.2
# Efficiency of turning predated rabbits into foxes
bf <- 0.1
# Initial predator/prey populations
x <- 4200
y <- 100
# Model output
while (x > 3900) { # line 1
cat("x =", x, " y =", y, "\n") # line 2
x.new <- (1 + br) * x - dr * x * y # line 3
y.new <- (1 - df) * y + bf * dr * x * y # line 4
x <- x.new # line 5
y <- y.new # line 6
} # line 7
x = 4200 y = 100
x = 4158 y = 101
x = 4114.341 y = 101.7979
x = 4069.499 y = 102.3799
x = 4023.962 y = 102.7356
x = 3978.218 y = 102.8587
x = 3932.749 y = 102.7467
cat
means start a new line, ensuring that the printed output are printed lines by line successively instead of just one line.find_min_max <- function(x, summary_stat) {
# Find minimum or maximum
if (summary_stat == "min") {
# Initialize minimum value
x_min <- x[[1]]
# Loop
for (i in 2:length(x)) {
if (x_min > x[[i]]) {
x_min <- x[[i]]
}
}
# Output
x_min
} else if (summary_stat == "max") {
# Initialize minimum value
x_max <- x[[1]]
# Loop
for (i in 2:length(x)) {
if (x_max < x[[i]]) {
x_max <- x[[i]]
}
}
# Output
x_max
} else {
# Warning
abort(message = "summary_stat must either be min or max")
}
}
The function above uses if statements and for loops; there may be a need to benchmark performance. However, we are not creating copies each time we create a binding from the name “x_min” to a new vector object. This is because the new vector object only has a single name bound to it, and so R applies the modify-in-place optimization.
[1] 20
[1] 1923
[1] 20
[1] 1923
That is it for control flows in R! Hopefully, this was helpful.