Using the Apply Family of Functions in R

Jul 25

apply()

This function is used to apply a function over the margins of an array or matrix. Try the following code in R to see apply() in action:

# Creating a matrix
mat1 <- matrix(1:9, nrow = 3)
print(mat1)
# Apply the sum function across the rows (Margin = 1)
print(apply(mat1, 1, sum))
# Apply the sum function across the columns (Margin = 2)
print(apply(mat1, 2, sum))
The code returns the sums of rows and columns:

lapply()

Try the following code in R to see lapply() in action:

# Creating a list of numeric vectors
list1 <- list(a = 1:5, b = 6:10, c = 11:15)
# Let's say we want to find the mean of each vector
# A common way might be to use a loop, but it's more efficient to use lapply()
mean_list <- lapply(list1, mean)
print(mean_list)

You get back the following:

sapply()

Try the following code in R to see sapply() in action:

list1 <- list(a = 1:5, b = 6:10, c = 11:15)
mean_vec <- sapply(list1, mean)
print(mean_vec)

Here’s what you get:

As you can see, sapply() returns a vector, whereas lappy() returned a list. You might prefer a vector to a list when creating a new data frame or performing a calculation.

mapply()

Try the following code in R to see mapply() in action:

# Creating two vectors
vec1 <- 1:5
vec2 <- 6:10
# Summing corresponding elements from vec1 and vec2
sum_vec <- mapply(sum, vec1, vec2)
print(sum_vec)

Here’s what you get:

As you can see, this function applies a function in parallel over two or more vectors or arrays.

tapply()

Try the following code in R to see tapply() in action:

# Creating a numeric vector and a factor
num_vec <- c(1:10)
fac_vec <- factor(rep(letters[1:2], each = 5))
# Applying mean function over subsets of num_vec grouped by fac_vec
mean_vec <- tapply(num_vec, fac_vec, mean)
print(mean_vec)

Here’s what you get:

As you can see, tapply() applies a function over subsets of a vector grouped by some other vector, typically a factor.

Apply’s Advantages Over Loops

There are several reasons why you might prefer to use the apply family of functions over traditional loops in R.

Vectorization

R is a vectorized language, which means that operations are optimized to work with vectors and matrices. The apply family of functions adhere to this principle and are often faster than loops, especially when working with larger data sets.

Readability

Code using apply functions can be more concise and easier to understand, which is always beneficial in collaborative environments or when you're revisiting your code after some time.

Avoiding Explicit Loop Indexing

When using loops, it's common to have to manage loop indices, which can lead to errors. apply functions handle this behind the scenes, reducing the potential for mistakes.

Functional Programming

The apply family of functions allow you to take advantage of R's capabilities as a functional programming language, wherein functions are first-class objects that can be passed as arguments to other functions.

Less Coding

With apply functions, complex operations that would otherwise require multiple lines of code with loops can often be performed in a single line.

However, it's worth noting that while apply functions are often faster than loops for larger data sets, for smaller data sets the speed advantage may not be noticeable. Furthermore, in some specific cases or with more complex logic, writing a loop might be more intuitive or provide more flexibility. So, while apply functions are a powerful tool, whether to use them or loops depends on the specific context and requirements.

BridgeText can help you with all of your statistical analysis needs.

Have any questions?

Our support team is ready to answer your questions.

Help Center FAQ