install.packages("purrr")
Pre-lecture materials
Read ahead
Prerequisites
Before starting you must install the additional package:
purrr
- this provides a consistent functional programming interface to work with functions and vectors
You can do this by calling
or use the “Install Packages…” option from the “Tools” menu in RStudio.
Acknowledgements
Material for this lecture was borrowed and adopted from
Learning objectives
Functional Programming
The characteristics
At it is core, functional programming treats functions equally as other data structures, namely first class functions.
In R, this means that you can do many of the things with a function that you can do with a vector: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.
What do you mean?
- Assign a function to a variable
<- function(){
foo return("This is foo.")
}class(foo)
[1] "function"
- Store functions in a list
<- list(
foo_list fun_1 = function() return("foo_1"),
fun_2 = function() return("foo_2")
)
str(foo_list)
List of 2
$ fun_1:function ()
..- attr(*, "srcref")= 'srcref' int [1:8] 2 11 2 36 11 36 2 2
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7f925303de48>
$ fun_2:function ()
..- attr(*, "srcref")= 'srcref' int [1:8] 3 11 3 36 11 36 3 3
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7f925303de48>
- Pass functions as arguments to other functions
<- function(f) f()
shell shell(foo_list$fun_1)
[1] "foo_1"
shell(foo_list$fun_2)
[1] "foo_2"
- Create functions inside of functions & return them as the result of a function
<- function(){
foo_wrap <- function(){
foo_2 return("This is foo_2.")
}return(foo_2)
}
foo_wrap()
function(){
return("This is foo_2.")
}
<environment: 0x7f92410bf898>
foo_wrap())() (
[1] "This is foo_2."
The bottom line, you can manipulate functions as the same way as you can to a vector or a matrix.
Why is functional programming important?
Functional programming introduces a new style of programming, namely functional style. Broadly speaking, this programming style encourages programmers to write a big function as many smaller isolated functions, where each function addresses one specific task.
As a by-product, funcitonal style motivates more humanly readable code, and recyclable code.
"data_set.csv" |>
import_data_from_file() |>
data_cleaning() |>
run_regression() |>
model_diagnostics() |>
model_visualization()
"data_set2.csv" |>
import_data_from_file() |>
data_cleaning() |>
run_different_regression() |>
model_diagnostics() |>
model_visualization()
purrr
: the functional programming toolkit
The R package purrr
, as one important component of the tidyverse
, provides a interface to manipulate vectors in the functional style.
purrr
enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
The most popular function in purrr
is map()
which iterates over the supplied data structure and apply a function during the iterations. Beside the map
function,purrr
also offers a series of useful functions to manipulate list
the data structure.
The map
family
The map
family of functions provides a convenient way to iterate through vectors or lists and apply functions during this iteration. Depending on the dimension of the input and the format of the output, there are many different variants of the basic map
function.
map
as a foor loop
library(purrr)
<- function(x) x * 3
triple
# for loop
<- list()
loop_ret for(i in 1:3){
<- triple(i)
loop_ret[i]
}
# map implementation
<- map(.x = 1:3, .f = triple)
map_eg1 <- map(.x = 1:3, .f = ~triple(.x))
map_eg2 <- map(.x = 1:3, .f = function(x) triple(x))
map_eg3
identical(loop_ret,map_eg1)
[1] TRUE
identical(loop_ret,map_eg2)
[1] TRUE
identical(loop_ret,map_eg3)
[1] TRUE
map
with a data frame
<- data.frame(
tmp_dat x = 1:5,
y = 6:10
)
|>
tmp_dat map(.f = mean)
$x
[1] 3
$y
[1] 8
# Alternatively
# map(.x = tmp_dat, .f = mean)
Extra arguments for functions
<- as.list(tmp_dat)
tmp_dat2 $y[6] <- NA
tmp_dat2str(tmp_dat2)
List of 2
$ x: int [1:5] 1 2 3 4 5
$ y: int [1:6] 6 7 8 9 10 NA
|> map(.f = mean) # No extra arguments tmp_dat2
$x
[1] 3
$y
[1] NA
|>
tmp_dat2 map(.f = mean, na.rm = TRUE) # With extra arguments
$x
[1] 3
$y
[1] 8
|>
tmp_dat2 map(.f = function(x, remove_na) mean(x, na.rm = remove_na),
remove_na = TRUE)
$x
[1] 3
$y
[1] 8
Stratified analysis with map
We use the mtcars
from the package datasets
to demonstrate
library(datasets)
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
unique(mtcars$cyl) # different numbers of cylinders
[1] 6 4 8
We are interested in the averaged miles per gallon for vehicles with different numbers of cylinders
# Create a dataset for cylinders level
<- mtcars |> split(mtcars$cyl)
str_dat length(str_dat)
[1] 3
str(str_dat)
List of 3
$ 4:'data.frame': 11 obs. of 11 variables:
..$ mpg : num [1:11] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
..$ cyl : num [1:11] 4 4 4 4 4 4 4 4 4 4 ...
..$ disp: num [1:11] 108 146.7 140.8 78.7 75.7 ...
..$ hp : num [1:11] 93 62 95 66 52 65 97 66 91 113 ...
..$ drat: num [1:11] 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
..$ wt : num [1:11] 2.32 3.19 3.15 2.2 1.61 ...
..$ qsec: num [1:11] 18.6 20 22.9 19.5 18.5 ...
..$ vs : num [1:11] 1 1 1 1 1 1 1 1 0 1 ...
..$ am : num [1:11] 1 0 0 1 1 1 0 1 1 1 ...
..$ gear: num [1:11] 4 4 4 4 4 4 3 4 5 5 ...
..$ carb: num [1:11] 1 2 2 1 2 1 1 1 2 2 ...
$ 6:'data.frame': 7 obs. of 11 variables:
..$ mpg : num [1:7] 21 21 21.4 18.1 19.2 17.8 19.7
..$ cyl : num [1:7] 6 6 6 6 6 6 6
..$ disp: num [1:7] 160 160 258 225 168 ...
..$ hp : num [1:7] 110 110 110 105 123 123 175
..$ drat: num [1:7] 3.9 3.9 3.08 2.76 3.92 3.92 3.62
..$ wt : num [1:7] 2.62 2.88 3.21 3.46 3.44 ...
..$ qsec: num [1:7] 16.5 17 19.4 20.2 18.3 ...
..$ vs : num [1:7] 0 0 1 1 1 1 0
..$ am : num [1:7] 1 1 0 0 0 0 1
..$ gear: num [1:7] 4 4 3 3 4 4 5
..$ carb: num [1:7] 4 4 1 1 4 4 6
$ 8:'data.frame': 14 obs. of 11 variables:
..$ mpg : num [1:14] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 ...
..$ cyl : num [1:14] 8 8 8 8 8 8 8 8 8 8 ...
..$ disp: num [1:14] 360 360 276 276 276 ...
..$ hp : num [1:14] 175 245 180 180 180 205 215 230 150 150 ...
..$ drat: num [1:14] 3.15 3.21 3.07 3.07 3.07 2.93 3 3.23 2.76 3.15 ...
..$ wt : num [1:14] 3.44 3.57 4.07 3.73 3.78 ...
..$ qsec: num [1:14] 17 15.8 17.4 17.6 18 ...
..$ vs : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
..$ am : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:14] 3 3 3 3 3 3 3 3 3 3 ...
..$ carb: num [1:14] 2 4 3 3 3 4 4 4 2 2 ...
|>
str_dat map(.f = ~mean(.x$mpg))
$`4`
[1] 26.66364
$`6`
[1] 19.74286
$`8`
[1] 15.1
Matrix as the output
The map
family include functions that organize the output in different data structures, whose names follow the pattern map_*
. As we’ve seen, the map
function return a list. The following functions will return a vector of a specific kind, e.g. map_lgl
returns a vector of logical variables, map_chr
returns a vector of strings. It is also possible to return the the results as data frames by row binding (map_dfr
) or column binding (map_dfc
).
|>
str_dat map_dbl(.f = ~mean(.x$mpg)) # returns a vector of doubles
4 6 8
26.66364 19.74286 15.10000
|>
str_dat map_dfr(.f = ~colMeans(.x)) # return a data frame by row binding
# A tibble: 3 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 26.7 4 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
2 19.7 6 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43
3 15.1 8 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
|>
str_dat map_dfc(.f = ~colMeans(.x)) # return a data frame by col binding
# A tibble: 11 × 3
`4` `6` `8`
<dbl> <dbl> <dbl>
1 26.7 19.7 15.1
2 4 6 8
3 105. 183. 353.
4 82.6 122. 209.
5 4.07 3.59 3.23
6 2.29 3.12 4.00
7 19.1 18.0 16.8
8 0.909 0.571 0
9 0.727 0.429 0.143
10 4.09 3.86 3.29
11 1.55 3.43 3.5
Multiple Input
It is possible that an operation requires a pair of variables as input. While it is still managable in map
to achieve this, there are better options provided in purrr
, specifically map2
and pmap
.
<- map_dbl(.x = mtcars, .f = mean)
map_avg
<- map2_dbl(.x = mtcars,
map2_avg .y = list(weight = 1/nrow(mtcars)),
.f = ~sum(.x*.y))
identical(map_avg, map2_avg)
[1] TRUE
<- pmap_dbl(list(x = mtcars,
pmap_avg y = list(weight = 1/(2*nrow(mtcars))),
z = list(weight2 = 2)),
.f = ~sum(..1*..2*..3))
identical(map_avg, pmap_avg)
[1] TRUE
# Use element names in pmap
$weight <- 1/2
mtcars$weight2 <- 2
mtcars<- pmap_dbl(mtcars,
pmap_eg2 .f = function(mpg, weight, weight2, ...){
* weight * weight2
mpg
})
identical(pmap_eg2, mtcars$mpg)
[1] TRUE
No output
It is possible that some operations don’t need any output during the iteration, e.g. saving the dataset. In this case, map
will force an output, e.g. NULL
. One can consider using walk
instead. The function walk
behaves exactly the same as map
but does not output anything.
<- tempdir()
tmp_fldr
map2(.x = str_dat,
.y = 1:length(str_dat),
.f = ~saveRDS(.x,
file = paste0(tmp_fldr, "/",.y, ".rds"))
)
$`4`
NULL
$`6`
NULL
$`8`
NULL
# No output
walk2(.x = str_dat,
.y = (1:length(str_dat)),
.f = ~saveRDS(.x,
file = paste0(tmp_fldr, "/",.y, ".rds"))
)
Other functions in purrr
reduce
and accumulate
purrr
also provides functions to summarize a list by a preferred operator, namesly reduce
. Its variant accumulate
provides the history of this reduction process.
$weight <- 1/(2*nrow(mtcars))
mtcars$weight2 <- 2
mtcars<-
reduce_eg pmap_dbl(mtcars,
.f = function(mpg, weight, weight2, ...){
* weight * weight2
mpg |>
}) reduce(`+`)
pmap_dbl(mtcars,
.f = function(mpg, weight, weight2, ...){
* weight * weight2
mpg |>
})head() |> # Only show the first 7 operations
accumulate(`+`)
[1] 0.656250 1.312500 2.025000 2.693750 3.278125 3.843750
Working with list
Let’s move to the purrr
cheatsheet at https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf.
Summary
- Introduction to functional programming.
- The R package
purrr
provides a nice interface to functional programming and list manipulation. - The function
map
and its aternativemap_*
provide a neat way to iterate over a list or vector with the output in different data structures. - The function
map2
andpmap
allow having more than one list as input. - The function
walk
and its alternativeswalk2
,walk_*
do not provide any output. - The functions
reduce
andaccumulate
help to summarize a list with a preferred operator or function.