Parallel programming

and dealing with large data…
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

November 19, 2024

Pre-lecture activities

Important

In advance of class, please

  • future - this provides a unified parallel framework in R consistent

You can do this by calling

install.packages("future")

And load the package using:

library(future)

In addition, please read through

How much should I prepare for before class?

You should have future installed and be familiar with the three basic functions - plan(), future(), and value().

We will learn more about these functions in class.

Lecture

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Understand the basics of parallel computing
  • Become familiar with basic functions in the future package
  • Recognize different file formats to work with large data not locally
  • Implement three ways to work with large data:
    1. “sample and model”
    2. “chunk and pull”
    3. “push compute to data”

Slides

Class activity

For the rest of the time in class, we will practice using the future package. There are two tutorials for you to work through own your own developed by Henrik Bengtsson from the UseR! 2024 conference:

Also, if you would like to try out the three strategies we learned about in class today for dealing with large data, try working through the pre-reading: