The Apply Series

1 Introduction

There are four functions in the apply series. Apply, sapply, lapply, and tapply. Each one applies a function to a matrix. These functions can be used to replace a for loop.

First create a dataset, 30 rows by 3 columns.

mat <- matrix(data = cbind(rnorm(30), rnorm(30, 2, 1), rnorm(30, 5, 2)), nrow = 30, ncol = 3)
head(mat)
##            [,1]     [,2]     [,3]
## [1,]  0.4032129 2.771183 5.844326
## [2,]  0.6296605 1.431866 7.043301
## [3,]  1.8996047 2.492309 3.727005
## [4,] -1.9772721 2.165451 2.677211
## [5,]  1.8173373 1.951150 5.184133
## [6,] -0.6785691 1.151731 5.044919

2 apply


Apply will apply a function to each row or column of a matrix and can do it much more simply than a for loop.

This uses a for loop to print out the mean of each row.

for (i in 1:dim(mat)[1]){print(mean(mat[i,]))}
## [1] 3.006241
## [1] 3.034943
## [1] 2.706307
## [1] 0.95513
## [1] 2.984207
## [1] 1.83936
## [1] 1.947223
## [1] 2.314033
## [1] 1.153832
## [1] 0.9363747
## [1] 2.725052
## [1] 2.267135
## [1] 2.327162
## [1] 3.17706
## [1] 2.975939
## [1] 2.37326
## [1] 3.261666
## [1] 1.758458
## [1] 2.823152
## [1] 2.488773
## [1] 2.025731
## [1] 2.347536
## [1] 2.442898
## [1] 3.00893
## [1] 2.170108
## [1] 1.863044
## [1] 3.346938
## [1] 2.203084
## [1] 2.191656
## [1] 3.560264


This prints the mean of each column in mat.

for (i in 1:dim(mat)[2]){print(mean(mat[,i]))}
## [1] 0.2374958
## [1] 1.955617
## [1] 5.028437


For loops work but the same result can be achieved with less code.

This uses apply to calculate the mean for each row of mat.

head(apply (mat, 1, mean)) 
## [1] 3.006241 3.034943 2.706307 0.955130 2.984207 1.839360


This applies the mean function to each column of mat.

apply (mat, 2, mean) 
## [1] 0.2374958 1.9556170 5.0284369


apply (mat, 2, is.vector)
## [1] TRUE TRUE TRUE


apply (mat, 2, length)
## [1] 30 30 30


Admittedly you could also use colmean or rowmean, but we’re talking about apply right now.

colMeans(mat)
## [1] 0.2374958 1.9556170 5.0284369


head(rowMeans(mat))
## [1] 3.006241 3.034943 2.706307 0.955130 2.984207 1.839360


3 Using a function with apply


This attempts to get a length of all numbers greater than 0 but returns an error:

apply (mat, 2, length(x[x>0]))

The statement requires “function (x)”. No error here:

apply (mat, 2, function (x) length(x[x>0]))
## [1] 18 29 30


This works

apply (mat, 2, sum)
## [1]   7.124874  58.668510 150.853108


This returns an error:

apply (mat, 2, sum(x + 1))

This adds one to each number and then adds all numbers together

apply (mat, 2, function (x) sum(x + 1))
## [1]  37.12487  88.66851 180.85311


This adds all numbers greater than 0

apply (mat, 2, function (x) sum(x[x>0]))
## [1]  16.95222  58.66955 150.85311


This returns a mean of all numbers greater than 0

apply (mat, 2, function (x) mean(x[x>0]))
## [1] 0.9417899 2.0230879 5.0284369


4 sapply and lapply

You use lapply when you want a list and sapply when you want a vector. This works across a vector or list of data.

sapply returns a vector

sapply(1:3, function(x) x^2)
## [1] 1 4 9


sapply(mat, function (x) x^2)
##  [1] 1.625807e-01 3.964724e-01 3.608498e+00 3.909605e+00 3.302715e+00
##  [6] 4.604560e-01 1.497467e-02 6.564799e+00 1.318396e+00 1.127890e+00
## [11] 4.511347e-01 5.280054e-01 2.061443e+00 1.092626e-01 1.201799e-01
## [16] 5.551805e-01 4.151049e-01 4.291583e-01 2.504925e-01 4.398301e-01
## [21] 7.491466e-01 1.910609e-01 1.231818e+00 6.677822e-01 6.532021e-01
## [26] 1.021982e+00 1.295269e-01 7.226142e-03 2.146680e-02 4.491779e+00
## [31] 7.679458e+00 2.050242e+00 6.211607e+00 4.689178e+00 3.806987e+00
## [36] 1.326484e+00 1.331745e+00 5.640956e+00 1.081441e-06 5.633132e-01
## [41] 5.607748e+00 2.251085e+00 4.736829e+00 6.108216e+00 5.470297e+00
## [46] 3.044711e+00 3.898562e+00 3.103372e+00 6.127218e+00 4.541067e+00
## [51] 3.552846e+00 8.582455e+00 1.113985e+00 1.882778e+00 2.219573e+00
## [56] 2.998063e+00 1.079979e+01 1.161231e+01 3.162724e+00 6.491877e+00
## [61] 3.415614e+01 4.960809e+01 1.389057e+01 7.167459e+00 2.687524e+01
## [66] 2.545121e+01 2.313637e+01 4.019398e+00 2.125902e+01 9.738168e+00
## [71] 4.197424e+01 2.092516e+01 3.894799e+01 5.461578e+01 3.896607e+01
## [76] 2.143472e+01 5.135493e+01 1.737920e+01 4.218017e+01 2.182894e+01
## [81] 1.106733e+01 2.070368e+01 2.666036e+01 4.675097e+01 3.397387e+01
## [86] 8.103745e+00 4.089108e+01 9.712948e+00 2.162294e+01 3.616212e+01


lapply returns a list.

lapply(1:3, function(x) x^2)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9


In this case it’s a long list 90 elements long.

long_list <- lapply(mat, function (x) x^2)
head(long_list)
## [[1]]
## [1] 0.1625807
## 
## [[2]]
## [1] 0.3964724
## 
## [[3]]
## [1] 3.608498
## 
## [[4]]
## [1] 3.909605
## 
## [[5]]
## [1] 3.302715
## 
## [[6]]
## [1] 0.460456


Adding unlist turns it into a vector.

unlist(lapply(1:3, function (x) x^2))
## [1] 1 4 9


You can also make sapply return a list.

sapply(1:3, function (x) x^2, simplify = F)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9


You can run sapply on each column of a matrix like using apply.

sapply(1:3, function(x) mean(mat[,x])) == apply (mat, 2, mean)
## [1] TRUE TRUE TRUE


This just gets complicated but works, returning a mean for each column of mat.

sapply(1:3, function(x, y) mean(y[,x]), y = mat)
## [1] 0.2374958 1.9556170 5.0284369


5 tapply

tapply allows us to apply a function to a dataset by group.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


In this case tapply will give mean horsepower based on the number of cylinders in an engine.

tapply(mtcars$hp, mtcars$cyl, mean)
##         4         6         8 
##  82.63636 122.28571 209.21429


Or mpg based on the number of cylinders

tapply(mtcars$mpg, mtcars$cyl, mean)
##        4        6        8 
## 26.66364 19.74286 15.10000


6 Additional Websites

To see more about the apply series see here:

R-Bloggers Using Apply Series