Project 3

project 3
projects
Building websites for R packages; practice functional programming and APIs
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

November 28, 2023

Background

library(tidyverse)

Due date: December 12 at 11:59pm

The goal of this assignment is to practice building websites for R packages, along with practice functional programming and using APIs.

To submit your project

In both parts below, you will need to create two separate github repositories for yourself. The links to create the repositories will be in CoursePlus.

The first one (Part 1) will be a public repository to build a website for an R package. It is public because you will need to deploy the website.

The second one (Part 2) will be a private repository to practice using an APIs, practice functional programming, and building data analyses.

Part 1

Here, we will practice using pkgdown. Using any R package with a GitHub repository (that does not already have a pkgdown website), use pkgdown to create a website for the software package.

Note

This could even been a package that you have written (or are working on creating right now). Otherwise, this could be a package that you have used previously or you can pick one you are not familiar with and just want to know more about!

It should not be the package you created in Project 2 for this course.

Part 1A: Create website locally

Fork the GitHub repository from the original location to your own GitHub account. Clone the repository to your local computer.

Use usethis and pkgdown to create a website locally for the R package of your choice.

Part 1B: Customize the website

Here, you need to customize the website in at least 5 ways. How you customize is up to you. The pkgdown website has lots of suggestions for you to try out!

Part 1C: Create an example data analysis

In this part, you will create a data analysis (or a case study) where you demonstrate the functions in the R package. Specifically, you will add another article or vignette titled “Example analysis” inside the /vignettes folder.

Similar to Project 2, you must pick out a data set from TidyTuesday that you have not worked with before (i.e. not in a previous project or assignment from this class or from 776, but other classes or personal projects are acceptable). You must also demonstrate wrangling and plotting the data. Finally, your example analysis, must also demonstrate at least 2 functions from the R package in some way in the vignette.

Other requirements for this part of vignette are the following:

  1. Pick any data set you wish from TidyTuesday to analyze.
  • You must describe what is the question you aim to answer with the data and data analysis.
  • You must describe and link to where the original data come from that you chose.
  • You must include a link to a data dictionary for the data or create one inside the webpage.
  1. Load the data into R
  • In this step, you must test if a directory named data exists locally. If it does not, write an R function that creates it programmatically.
  • Saves the data only once (not each time you knit/render the document).
  • Read in the data locally each time you knit/render.
  1. Your analysis must include some form of data wrangling and data visualization.
  • You must use at least six different functions from dplyr, tidyr, lubridate, stringr, or forcats.
  • You must use at least two functions from purrr.
  • Your analysis should include at least three plots with you using at least three different geom_*() functions from ggplot2 (or another package with geom_*() functions).
    • Plots should have titles, subtitles, captions, and human-understandable axis labels.
    • At least one plot should using a type of faceting (facet_grid() or facet_wrap()).
  1. Apply at least 2 functions from the R package in the vignette.
  2. Summarize and interpret the results in 1-2 sentences.
  3. At the end of the data analysis, list out each of the functions you used from each of the packages (dplyr, tidyr, ggplot2, etc) to help the TA with respect to making sure you met all the requirements described above.

Part 1D: Create a README.md file

If the package does not already include one, create and include a README.md file in the folder where the R package and pkgdown files are on your computer and add the following information below.

If it already has a README.md file, just edit the top of the file with the following information:

  • Include a URL to the GitHub link to where the original R package came from.
  • Include a URL to the deployed website that you will do in Part 1E, but it should be something like https://jhu-statprogramming-fall-2022.github.io/biostat840-project3-pkgdown-<your_github_username>.
  • Include a description of the 5 things you customized in your pkgdown website (excluding adding the example data analysis from Part 1C).

The readme must also include (if it does not already):

  • The title of package
  • The original author of the package (and you who made the website and example data analysis)
  • A goal / description of the package
  • A list of exported functions that are in the package. Briefly describe each function.
  • A basic example with one of the functions.

Part 1E: Deploy the website

The link to create a public GitHub repository for yourself to complete this part of Project 3 will be posted in CoursePlus. This creates an empty GitHub repository.

When ready, deploy the website.

Note

You need to modify the template code that is provided to you from GitHub when you set the remote. When you fork the repository, there will already be a remote origin (from where you cloned the remote repository to your local repository), which you can see with

Bash
git remote -v

In GitHub Classroom, you will create a repo for this part of the project. When you click on the link in the Courseplus discussion form to create the link, you will see this line:

Bash
git remote add origin <link>

Instead of pushing to the forked repository, you want to change where you push your code (i.e. you want to push to the private repo on GitHub Classroom). To do this, you want ot change the above line to this where <link> is the GitHub Classroom link:

Bash
git remote add upstream <link>

and when you push your code, you want to use git push -u upstream main, for example (not git push -u origin main).

Part 2

Here, we will practice using APIs and making data visualizations.

For this part of Project 3, you need to create a private GitHub repository for yourself, which will be posted in CoursePlus. This creates an empty GitHub repository. You need to show all your code and submit both the .qmd file and the rendered HTML file.

Note

When you use an API, you want to figure out the data you want to extract and then save it locally so that you are not using the API each time you knit or render your data analysis.

Most APIs have limits on the number of times you can ping it in a given hour and your IP address can be blocked if you try to ping it too many times within a short time.

The API we will use is tidycensus (https://walker-data.com/tidycensus), which is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

The goal of this part is to create a data analysis (or a case study) using the US Census Bureau’s data.

Important

To use this API, you must obtain and API key, which can be found at the top of this page:

  • https://walker-data.com/tidycensus/articles/basic-usage.html

Other requirements for this part are the following:

  1. You must describe what is the question you aim to answer with the data and data analysis.
  2. You must use at least three different calls to the tidycensus API to extract out different datasets. For example, these could be across years, locations, or variables.
  • In this step, you must test if a directory named data exists locally. If it does not, write an R function that creates it programmatically.
  • Saves the data only once (not each time you knit/render the document).
  • Read in the data locally each time you knit/render.
  1. Your analysis must include some form of data wrangling and data visualization.
  • You must use at least six different functions from dplyr, tidyr, lubridate, stringr, or forcats.
  • You must use at least two functions from purrr.
  • Your analysis should include at least three plots with you using at least three different geom_*() functions from ggplot2 (or another package with geom_*() functions).
    • Plots should have titles, subtitles, captions, and human-understandable axis labels.
    • At least one plot should using a type of faceting (facet_grid() or facet_wrap()).
  1. Summarize and interpret the results in 1-2 sentences.
  2. At the end of the data analysis, list out each of the functions you used from each of the packages (dplyr, tidyr, ggplot2, etc) to help the TA with respect to making sure you met all the requirements described above.