Functional programming

Using the purrr package to work with functions and vectors
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

November 5, 2024

Pre-lecture activities

Important

In advance of class, please install

  • purrr - this provides a consistent functional programming interface to work with functions and vectors

You can do this by calling

install.packages("purrr")

And load the package using:

library(purrr)

In addition, please read through

How much should I prepare for before class?

You should have purrr installed and be familiar with the map() function along with map variants in purrr.

We will learn more about these functions in class, but we will also practice them at the end of class.

Lecture

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Be familiar with the concept of functional languages and functional styles of programming
  • Get comfortable with the major functions in purrr (e.g. the map family of functions) provides a nice interface to functional programming and list manipulation
  • Practice the function map and its alternative map_* provide a neat way to iterate over a list or vector with the output in different data structures (instead of a for loop)
  • Recongize the function map2 and pmap allow having more than one list as input.
  • Recongize the function walk and its alternatives walk2, walk_*, which do not provide any output.

Slides

Class activity

For the rest of the time in class, we will use the mtcars dataset to practice using functions from the purrr package. The purpose is to demonstrates how functional programming can simplify repeated operations on datasets. This dataset contains various measurements for different car models. Find a partner and complete the activities below together. After running each step, I encourage you to look at the outputs and transformations to understand how purrr is working with each column in the dataset.

Objectives of the activity
  • Use purrr::map_* functions to calculate summary statistics for each column.
  • Transform specific columns using functions from purrr.
  • Apply a conditional transformation across the dataset.

First, we will load two packages and create a copy of the mtcars dataset

library(purrr)
library(dplyr)

data <- mtcars

Part 1: Basic column summaries

  • Use map_dbl() to calculate the mean of each column in the mtcars dataset.

Hint: map_dbl(data, mean) will apply the mean function to each column and return a vector of means.

  • Use map_dbl() to calculate the standard deviation of each column.

  • Use map() to create a named list where each element is the range of values in a column.

# Calculate the mean of each column
column_means <- map_dbl(data, mean)
column_means
       mpg        cyl       disp         hp       drat         wt       qsec 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
        vs         am       gear       carb 
  0.437500   0.406250   3.687500   2.812500 
# Calculate the standard deviation of each column
column_sds <- map_dbl(data, sd)
column_sds
        mpg         cyl        disp          hp        drat          wt 
  6.0269481   1.7859216 123.9386938  68.5628685   0.5346787   0.9784574 
       qsec          vs          am        gear        carb 
  1.7869432   0.5040161   0.4989909   0.7378041   1.6152000 
# Calculate the range of each column
column_ranges <- map(data, range)
column_ranges
$mpg
[1] 10.4 33.9

$cyl
[1] 4 8

$disp
[1]  71.1 472.0

$hp
[1]  52 335

$drat
[1] 2.76 4.93

$wt
[1] 1.513 5.424

$qsec
[1] 14.5 22.9

$vs
[1] 0 1

$am
[1] 0 1

$gear
[1] 3 5

$carb
[1] 1 8

The first one will return a named vector of mean values for each column. The last one returns a named list where each element is a vector of the minimum and maximum values for each column.

Part 2: Conditional transformation with map_if()

  • Use map_if() to multiply by 100 all columns where the mean is less than 20. For example, if the column represents miles per gallon, multiply each value by 100.
  • Check the resulting data frame.
transformed_data <- map_if(data, ~ mean(.x) < 20, ~ .x * 100)
transformed_data <- as.data.frame(transformed_data)  # Convert back to data frame if desired
transformed_data
    mpg cyl  disp  hp drat    wt qsec  vs  am gear carb
1  21.0 600 160.0 110  390 262.0 1646   0 100  400  400
2  21.0 600 160.0 110  390 287.5 1702   0 100  400  400
3  22.8 400 108.0  93  385 232.0 1861 100 100  400  100
4  21.4 600 258.0 110  308 321.5 1944 100   0  300  100
5  18.7 800 360.0 175  315 344.0 1702   0   0  300  200
6  18.1 600 225.0 105  276 346.0 2022 100   0  300  100
7  14.3 800 360.0 245  321 357.0 1584   0   0  300  400
8  24.4 400 146.7  62  369 319.0 2000 100   0  400  200
9  22.8 400 140.8  95  392 315.0 2290 100   0  400  200
10 19.2 600 167.6 123  392 344.0 1830 100   0  400  400
11 17.8 600 167.6 123  392 344.0 1890 100   0  400  400
12 16.4 800 275.8 180  307 407.0 1740   0   0  300  300
13 17.3 800 275.8 180  307 373.0 1760   0   0  300  300
14 15.2 800 275.8 180  307 378.0 1800   0   0  300  300
15 10.4 800 472.0 205  293 525.0 1798   0   0  300  400
16 10.4 800 460.0 215  300 542.4 1782   0   0  300  400
17 14.7 800 440.0 230  323 534.5 1742   0   0  300  400
18 32.4 400  78.7  66  408 220.0 1947 100 100  400  100
19 30.4 400  75.7  52  493 161.5 1852 100 100  400  200
20 33.9 400  71.1  65  422 183.5 1990 100 100  400  100
21 21.5 400 120.1  97  370 246.5 2001 100   0  300  100
22 15.5 800 318.0 150  276 352.0 1687   0   0  300  200
23 15.2 800 304.0 150  315 343.5 1730   0   0  300  200
24 13.3 800 350.0 245  373 384.0 1541   0   0  300  400
25 19.2 800 400.0 175  308 384.5 1705   0   0  300  200
26 27.3 400  79.0  66  408 193.5 1890 100 100  400  100
27 26.0 400 120.3  91  443 214.0 1670   0 100  500  200
28 30.4 400  95.1 113  377 151.3 1690 100 100  500  200
29 15.8 800 351.0 264  422 317.0 1450   0 100  500  400
30 19.7 600 145.0 175  362 277.0 1550   0 100  500  600
31 15.0 800 301.0 335  354 357.0 1460   0 100  500  800
32 21.4 400 121.0 109  411 278.0 1860 100 100  400  200

This will multiply each element in columns where the mean is less than 20 by 100. Note that the output is a list; converting it back to a data frame makes it easier to view the result.

Part 3: Apply a custom function with map()

  • Write a custom function that takes a column and returns the log-transformed values if the mean of the column is greater than 20, otherwise returns the square root of the values.

  • Use map() to apply this custom function to each column.

transform_column <- function(column) {
  if (mean(column) > 20) {
    log(column)
  } else {
    sqrt(column)
  }
}
transformed_data_custom <- map(data, transform_column)
transformed_data_custom <- as.data.frame(transformed_data_custom)
transformed_data_custom
        mpg      cyl     disp       hp     drat       wt     qsec vs am
1  3.044522 2.449490 5.075174 4.700480 1.974842 1.618641 4.057093  0  1
2  3.044522 2.449490 5.075174 4.700480 1.974842 1.695582 4.125530  0  1
3  3.126761 2.000000 4.682131 4.532599 1.962142 1.523155 4.313931  1  1
4  3.063391 2.449490 5.552960 4.700480 1.754993 1.793042 4.409082  1  0
5  2.928524 2.828427 5.886104 5.164786 1.774824 1.854724 4.125530  0  0
6  2.895912 2.449490 5.416100 4.653960 1.661325 1.860108 4.496665  1  0
7  2.660260 2.828427 5.886104 5.501258 1.791647 1.889444 3.979950  0  0
8  3.194583 2.000000 4.988390 4.127134 1.920937 1.786057 4.472136  1  0
9  3.126761 2.000000 4.947340 4.553877 1.979899 1.774824 4.785394  1  0
10 2.954910 2.449490 5.121580 4.812184 1.979899 1.854724 4.277850  1  0
11 2.879198 2.449490 5.121580 4.812184 1.979899 1.854724 4.347413  1  0
12 2.797281 2.828427 5.619676 5.192957 1.752142 2.017424 4.171331  0  0
13 2.850707 2.828427 5.619676 5.192957 1.752142 1.931321 4.195235  0  0
14 2.721295 2.828427 5.619676 5.192957 1.752142 1.944222 4.242641  0  0
15 2.341806 2.828427 6.156979 5.323010 1.711724 2.291288 4.240283  0  0
16 2.341806 2.828427 6.131226 5.370638 1.732051 2.328948 4.221374  0  0
17 2.687847 2.828427 6.086775 5.438079 1.797220 2.311926 4.173727  0  0
18 3.478158 2.000000 4.365643 4.189655 2.019901 1.483240 4.412482  1  1
19 3.414443 2.000000 4.326778 3.951244 2.220360 1.270827 4.303487  1  1
20 3.523415 2.000000 4.264087 4.174387 2.054264 1.354622 4.460942  1  1
21 3.068053 2.000000 4.788325 4.574711 1.923538 1.570032 4.473254  1  0
22 2.740840 2.828427 5.762051 5.010635 1.661325 1.876166 4.107311  0  0
23 2.721295 2.828427 5.717028 5.010635 1.774824 1.853375 4.159327  0  0
24 2.587764 2.828427 5.857933 5.501258 1.931321 1.959592 3.925557  0  0
25 2.954910 2.828427 5.991465 5.164786 1.754993 1.960867 4.129165  0  0
26 3.306887 2.000000 4.369448 4.189655 2.019901 1.391043 4.347413  1  1
27 3.258097 2.000000 4.789989 4.510860 2.104757 1.462874 4.086563  0  1
28 3.414443 2.000000 4.554929 4.727388 1.941649 1.230041 4.110961  1  1
29 2.760010 2.828427 5.860786 5.575949 2.054264 1.780449 3.807887  0  1
30 2.980619 2.449490 4.976734 5.164786 1.902630 1.664332 3.937004  0  1
31 2.708050 2.828427 5.707110 5.814131 1.881489 1.889444 3.820995  0  1
32 3.063391 2.000000 4.795791 4.691348 2.027313 1.667333 4.312772  1  1
       gear     carb
1  2.000000 2.000000
2  2.000000 2.000000
3  2.000000 1.000000
4  1.732051 1.000000
5  1.732051 1.414214
6  1.732051 1.000000
7  1.732051 2.000000
8  2.000000 1.414214
9  2.000000 1.414214
10 2.000000 2.000000
11 2.000000 2.000000
12 1.732051 1.732051
13 1.732051 1.732051
14 1.732051 1.732051
15 1.732051 2.000000
16 1.732051 2.000000
17 1.732051 2.000000
18 2.000000 1.000000
19 2.000000 1.414214
20 2.000000 1.000000
21 1.732051 1.000000
22 1.732051 1.414214
23 1.732051 1.414214
24 1.732051 2.000000
25 1.732051 1.414214
26 2.000000 1.000000
27 2.236068 1.414214
28 2.236068 1.414214
29 2.236068 2.000000
30 2.236068 2.449490
31 2.236068 2.828427
32 2.000000 1.414214

Part 4: Row-wise transformation and aggregation

  • First, define a function that assigns each row a fuel efficiency category based on mpg:

    • “High” if mpg > 25
    • “Medium” if mpg is between 20 and 25
    • “Low” if mpg < 20
  • Use map() to apply this function row-wise and add the results as a new column in mtcars.

  • Then, split the dataset by fuel efficiency category from the previous step. For each category, calculate the mean values of hp, wt, and qsec (1/4 mile time).

# Function to assign fuel efficiency category based on mpg
fuel_efficiency <- function(row) {
  mpg <- row["mpg"]
  if (mpg > 25) {
    "High"
  } else if (mpg >= 20) {
    "Medium"
  } else {
    "Low"
  }
}

# Add fuel efficiency as a new column
mtcars$fuel_efficiency <- map_chr(asplit(mtcars, 1), fuel_efficiency)
mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
                    fuel_efficiency
Mazda RX4                    Medium
Mazda RX4 Wag                Medium
Datsun 710                   Medium
Hornet 4 Drive               Medium
Hornet Sportabout               Low
Valiant                         Low
Duster 360                      Low
Merc 240D                    Medium
Merc 230                     Medium
Merc 280                        Low
Merc 280C                       Low
Merc 450SE                      Low
Merc 450SL                      Low
Merc 450SLC                     Low
Cadillac Fleetwood              Low
Lincoln Continental             Low
Chrysler Imperial               Low
Fiat 128                       High
Honda Civic                    High
Toyota Corolla                 High
Toyota Corona                Medium
Dodge Challenger                Low
AMC Javelin                     Low
Camaro Z28                      Low
Pontiac Firebird                Low
Fiat X1-9                      High
Porsche 914-2                  High
Lotus Europa                   High
Ford Pantera L                  Low
Ferrari Dino                    Low
Maserati Bora                   Low
Volvo 142E                   Medium
# Split by fuel efficiency
efficiency_groups <- split(mtcars, mtcars$fuel_efficiency)

# Calculate mean hp, wt, and qsec for each group
efficiency_summary <- map(efficiency_groups, ~ map_dbl(.x[, c("hp", "wt", "qsec")], mean))
efficiency_summary
$High
      hp       wt     qsec 
75.50000  1.87300 18.39833 

$Low
        hp         wt       qsec 
191.944444   3.838833  17.096111 

$Medium
       hp        wt      qsec 
98.250000  2.826875 19.130000 

Part 5: Visualizations and transformations

  • Write a function that, for each row, converts mpg to a “fuel cost” using a hypothetical rate of $3 per gallon for a 100-mile journey where “fuel cost” = (100 mile journey / mpg) * $3 per gallon

  • Use map_dbl() to apply this transformation to the mpg column and add it as a new column called fuel_cost.

  • Using map() with ggplot2, create a bar plot of the average fuel_cost for each cyl. For each cylinder group, plot the average values of mpg, hp, and wt.

# Calculate fuel cost based on mpg
calculate_fuel_cost <- function(mpg) {
  (100 / mpg) * 3
}

# Apply transformation to mpg column
mtcars$fuel_cost <- map_dbl(mtcars$mpg, calculate_fuel_cost)
mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
                    fuel_efficiency fuel_cost
Mazda RX4                    Medium 14.285714
Mazda RX4 Wag                Medium 14.285714
Datsun 710                   Medium 13.157895
Hornet 4 Drive               Medium 14.018692
Hornet Sportabout               Low 16.042781
Valiant                         Low 16.574586
Duster 360                      Low 20.979021
Merc 240D                    Medium 12.295082
Merc 230                     Medium 13.157895
Merc 280                        Low 15.625000
Merc 280C                       Low 16.853933
Merc 450SE                      Low 18.292683
Merc 450SL                      Low 17.341040
Merc 450SLC                     Low 19.736842
Cadillac Fleetwood              Low 28.846154
Lincoln Continental             Low 28.846154
Chrysler Imperial               Low 20.408163
Fiat 128                       High  9.259259
Honda Civic                    High  9.868421
Toyota Corolla                 High  8.849558
Toyota Corona                Medium 13.953488
Dodge Challenger                Low 19.354839
AMC Javelin                     Low 19.736842
Camaro Z28                      Low 22.556391
Pontiac Firebird                Low 15.625000
Fiat X1-9                      High 10.989011
Porsche 914-2                  High 11.538462
Lotus Europa                   High  9.868421
Ford Pantera L                  Low 18.987342
Ferrari Dino                    Low 15.228426
Maserati Bora                   Low 20.000000
Volvo 142E                   Medium 14.018692
library(ggplot2)

# Calculate average fuel cost for each cylinder group
avg_fuel_cost <- mtcars %>%
  group_by(cyl) %>%
  summarize(avg_fuel_cost = mean(fuel_cost))

# Plot average fuel cost by cylinder count
ggplot(avg_fuel_cost, aes(x = factor(cyl), y = avg_fuel_cost)) +
  geom_bar(stat = "identity") +
  labs(x = "Number of Cylinders", 
       y = "Average Fuel Cost ($)", 
       title = "Fuel Cost by Cylinder Group") +
  theme_minimal()

Post-lecture

Additional practice

Here are some additional practice questions to help you think about the material discussed.

Questions
  1. Use as_mapper() to explore how purrr generates anonymous functions for the integer, character, and list helpers. What helper allows you to extract attributes? Read the documentation to find out.

  2. map(1:3, ~ runif(2)) is a useful pattern for generating random numbers, but map(1:3, runif(2)) is not. Why not? Can you explain why it returns the result that it does?

  3. Can you write a section of code to demonstrate the central limit theorem primarily using the purrr package and/or using the R base package?

  4. Use the appropriate map() function to:

    1. Compute the standard deviation of every column in a numeric data frame.

    2. Compute the standard deviation of every numeric column in a mixed data frame. (Hint: you will need to do it in two steps.)

    3. Compute the number of levels for every factor in a data frame.

  5. The following code simulates the performance of a t-test for non-normal data. Extract the p-value from each test, then visualise.

    trials <- map(1:100, ~ t.test(rpois(10, 10), rpois(7, 10)))
  6. Use map() to fit linear models to the mtcars dataset using the formulas stored in this list:

    data(mtcars)
    formulas <- list(
      mpg ~ disp,
      mpg ~ I(1 / disp),
      mpg ~ disp + wt,
      mpg ~ I(1 / disp) + wt
    )
  7. Fit the model mpg ~ disp to each of the bootstrap replicates of mtcars in the list below, then extract the \(R^2\) of the model fit (Hint: you can compute the \(R^2\) with summary().)

    bootstrap <- function(df) {
      df[sample(nrow(df), replace = TRUE), , drop = FALSE]
    }
    
    bootstraps <- map(1:10, ~ bootstrap(mtcars))
  8. How does purrr make it easier to perform repeated transformations compared to base R? What are some use cases where conditional transformations like map_if() could be helpful in real-world data?