The Apply Series
1 Introduction
There are four functions in the apply series. Apply, sapply, lapply, and tapply. Each one applies a function to a matrix. These functions can be used to replace a for loop.
First create a dataset, 30 rows by 3 columns.
mat <- matrix(data = cbind(rnorm(30), rnorm(30, 2, 1), rnorm(30, 5, 2)), nrow = 30, ncol = 3)
head(mat)
## [,1] [,2] [,3]
## [1,] 0.4032129 2.771183 5.844326
## [2,] 0.6296605 1.431866 7.043301
## [3,] 1.8996047 2.492309 3.727005
## [4,] -1.9772721 2.165451 2.677211
## [5,] 1.8173373 1.951150 5.184133
## [6,] -0.6785691 1.151731 5.044919
2 apply
Apply will apply a function to each row or column of a matrix and can do it much more simply than a for loop.
This uses a for loop to print out the mean of each row.
for (i in 1:dim(mat)[1]){print(mean(mat[i,]))}
## [1] 3.006241
## [1] 3.034943
## [1] 2.706307
## [1] 0.95513
## [1] 2.984207
## [1] 1.83936
## [1] 1.947223
## [1] 2.314033
## [1] 1.153832
## [1] 0.9363747
## [1] 2.725052
## [1] 2.267135
## [1] 2.327162
## [1] 3.17706
## [1] 2.975939
## [1] 2.37326
## [1] 3.261666
## [1] 1.758458
## [1] 2.823152
## [1] 2.488773
## [1] 2.025731
## [1] 2.347536
## [1] 2.442898
## [1] 3.00893
## [1] 2.170108
## [1] 1.863044
## [1] 3.346938
## [1] 2.203084
## [1] 2.191656
## [1] 3.560264
This prints the mean of each column in mat.
for (i in 1:dim(mat)[2]){print(mean(mat[,i]))}
## [1] 0.2374958
## [1] 1.955617
## [1] 5.028437
For loops work but the same result can be achieved with less code.
This uses apply to calculate the mean for each row of mat.
head(apply (mat, 1, mean))
## [1] 3.006241 3.034943 2.706307 0.955130 2.984207 1.839360
This applies the mean function to each column of mat.
apply (mat, 2, mean)
## [1] 0.2374958 1.9556170 5.0284369
apply (mat, 2, is.vector)
## [1] TRUE TRUE TRUE
apply (mat, 2, length)
## [1] 30 30 30
Admittedly you could also use colmean or rowmean, but we’re talking about apply right now.
colMeans(mat)
## [1] 0.2374958 1.9556170 5.0284369
head(rowMeans(mat))
## [1] 3.006241 3.034943 2.706307 0.955130 2.984207 1.839360
3 Using a function with apply
This attempts to get a length of all numbers greater than 0 but returns an error:
apply (mat, 2, length(x[x>0]))
The statement requires “function (x)”. No error here:
apply (mat, 2, function (x) length(x[x>0]))
## [1] 18 29 30
This works
apply (mat, 2, sum)
## [1] 7.124874 58.668510 150.853108
This returns an error:
apply (mat, 2, sum(x + 1))
This adds one to each number and then adds all numbers together
apply (mat, 2, function (x) sum(x + 1))
## [1] 37.12487 88.66851 180.85311
This adds all numbers greater than 0
apply (mat, 2, function (x) sum(x[x>0]))
## [1] 16.95222 58.66955 150.85311
This returns a mean of all numbers greater than 0
apply (mat, 2, function (x) mean(x[x>0]))
## [1] 0.9417899 2.0230879 5.0284369
4 sapply and lapply
You use lapply when you want a list and sapply when you want a vector. This works across a vector or list of data.
sapply returns a vector
sapply(1:3, function(x) x^2)
## [1] 1 4 9
sapply(mat, function (x) x^2)
## [1] 1.625807e-01 3.964724e-01 3.608498e+00 3.909605e+00 3.302715e+00
## [6] 4.604560e-01 1.497467e-02 6.564799e+00 1.318396e+00 1.127890e+00
## [11] 4.511347e-01 5.280054e-01 2.061443e+00 1.092626e-01 1.201799e-01
## [16] 5.551805e-01 4.151049e-01 4.291583e-01 2.504925e-01 4.398301e-01
## [21] 7.491466e-01 1.910609e-01 1.231818e+00 6.677822e-01 6.532021e-01
## [26] 1.021982e+00 1.295269e-01 7.226142e-03 2.146680e-02 4.491779e+00
## [31] 7.679458e+00 2.050242e+00 6.211607e+00 4.689178e+00 3.806987e+00
## [36] 1.326484e+00 1.331745e+00 5.640956e+00 1.081441e-06 5.633132e-01
## [41] 5.607748e+00 2.251085e+00 4.736829e+00 6.108216e+00 5.470297e+00
## [46] 3.044711e+00 3.898562e+00 3.103372e+00 6.127218e+00 4.541067e+00
## [51] 3.552846e+00 8.582455e+00 1.113985e+00 1.882778e+00 2.219573e+00
## [56] 2.998063e+00 1.079979e+01 1.161231e+01 3.162724e+00 6.491877e+00
## [61] 3.415614e+01 4.960809e+01 1.389057e+01 7.167459e+00 2.687524e+01
## [66] 2.545121e+01 2.313637e+01 4.019398e+00 2.125902e+01 9.738168e+00
## [71] 4.197424e+01 2.092516e+01 3.894799e+01 5.461578e+01 3.896607e+01
## [76] 2.143472e+01 5.135493e+01 1.737920e+01 4.218017e+01 2.182894e+01
## [81] 1.106733e+01 2.070368e+01 2.666036e+01 4.675097e+01 3.397387e+01
## [86] 8.103745e+00 4.089108e+01 9.712948e+00 2.162294e+01 3.616212e+01
lapply returns a list.
lapply(1:3, function(x) x^2)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
In this case it’s a long list 90 elements long.
long_list <- lapply(mat, function (x) x^2)
head(long_list)
## [[1]]
## [1] 0.1625807
##
## [[2]]
## [1] 0.3964724
##
## [[3]]
## [1] 3.608498
##
## [[4]]
## [1] 3.909605
##
## [[5]]
## [1] 3.302715
##
## [[6]]
## [1] 0.460456
Adding unlist turns it into a vector.
unlist(lapply(1:3, function (x) x^2))
## [1] 1 4 9
You can also make sapply return a list.
sapply(1:3, function (x) x^2, simplify = F)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
You can run sapply on each column of a matrix like using apply.
sapply(1:3, function(x) mean(mat[,x])) == apply (mat, 2, mean)
## [1] TRUE TRUE TRUE
This just gets complicated but works, returning a mean for each column of mat.
sapply(1:3, function(x, y) mean(y[,x]), y = mat)
## [1] 0.2374958 1.9556170 5.0284369
5 tapply
tapply allows us to apply a function to a dataset by group.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
In this case tapply will give mean horsepower based on the number of cylinders in an engine.
tapply(mtcars$hp, mtcars$cyl, mean)
## 4 6 8
## 82.63636 122.28571 209.21429
Or mpg based on the number of cylinders
tapply(mtcars$mpg, mtcars$cyl, mean)
## 4 6 8
## 26.66364 19.74286 15.10000