Reproducibile Workflows with targets

A Make-line pipeline tool for creating reproducible workflows in R
module 3
week 6
project management
targets
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

November 30, 2023

Pre-lecture materials

Read ahead

Read ahead

Before class, you can prepare by reading the following materials:

  1. https://books.ropensci.org/targets/
  2. https://books.ropensci.org/targets/walkthrough.html

Prerequisites

Before starting you must install the additional package:

  • targets - the R Workflows package
  • usethis - an automation package that simplifies project creation and setup
  • renv - a package manager in R

You can do this by calling

install.packages( c("usethis", "targets", "renv"))

Acknowledgements

Material for this lecture was borrowed and adopted from

  • https://books.ropensci.org/targets/walkthrough.html

Learning objectives

Learning objectives

At the end of this lesson you will:

  • set up targets analytic pipeline
  • write and run a data analysis with targets
  • replicate and retrieve analysis results from a targets workflow

Overview of targets

targets is an R package that helps you to manage your analysis. You can think of targets as a butler who helps you to manage your analytics. These services include:

  • to manage the ordering of your analysis so you will not confuse the steps when reproducing an analysis
  • to save the output of each analytic procedures so you do not have to wait for repetitive runs of static results
  • to monitor the change in your code so you can update the only analytic tasks that changes are made
  • to reproduce the whole analysis with a click so you do not have to run multiple scripts

Why do we use targets?

targets helps us be more efficient at managing analytic workflows, and hence improve productivity with bare minimum efforts. Let me simply put this way, managing file names for your code or saved objects can be very painful. But targets can help you handle that when use in combination with other version control system git.

How to use targets

The {targets} R package user manual is a great source to learn how to use targets. The intro level of targets tutorial is well documented in Chapter 2 Walkthrough.

Instead going through the chapter with you, I will focus on some tricks that is not discussed in the user manual.

Set up a targets workflow

# Start a new R project
usethis::create_project("targets_eg")
# Config target workflow
targets::use_targets()

(Optional) Version control packages with renv

# Config renv system
renv::init()
targets::tar_renv()

If other people opens up this project on a different computer, renv will automatically install all the necessary packages, especially the same versions of those packages.

Important renv functions

Idealistically, you need to keep track of your R packages in every analysis, similar to you version control your files using git. You may need to call the following functions periodically, i.e. after you add/remove necessary packages.

  • targets::tar_renv() updates _targets_packages.R by gathering all packages in your analytic workflow
  • renv::status() shows which packages are outdated or not recorded
  • renv::snapshot() updates your packages version number by taking a snapshot of your project library
  • renv::restore() restores all missing packages or packages whose version number doesn’t match with the most updated snapshot.

For more information, visit https://rstudio.github.io/renv/articles/renv.html

Set up keyboard shortcuts

targets provide some addins to help users navigate through workflow management with a click-and-point system. For example, if you click on the Addins button in the tool bar (highlighted in the screen capture below) which locates on the top of the RStudio window, you can see many options that help you to work with targets

A screenshot of addins for targets

With these addins, you don’t necessarily have to remember all the functions to run targets, such as targets::tar_make(), targets::tar_load(), targets::tar_visnetwork(), etc.

If you prefer keyboard shortcuts, you can set up for these commonly used functions. In order to do that, you need to go to Tools -> Modify Keyboard Shortcuts.

A screenshot of how to modify keyboard shortcuts

With in the pop-up keyboard shortcuts menu, you can search addin or targets or a specific target addin function, e.g. Load target at cursor in the search box. You can customize the keyboard shortcut by clicking on the input box within the Shortcut column.

A screenshot of keyboard shortcuts menu

Summary

targets is a workflow management powerhouse. It offers much more utility than we covered today. The {targets} R package user manual does an excellent job on explaining how to set up parallel computing with the system, to work with markdown systems (I managed my dissertation writing in targets), and many more.

Nevertheless, I need to warn you that learning targets could be intimidating at the beginning because of the setup process and new syntax. It may take multiple iterations or projects until you are comfortable using it.

But it is very rewarding and can save you a lot of time in the long run! It is a worthy investment of time.

Additional Resources

Tip