library(tidyverse)
Project 3
Background
Due date: December 13 at 11:59pm
The goal of this assignment is to practice building websites for R packages, along with practice functional programming and using APIs.
To submit your project
In both parts below, you will need to create two separate github repositories for yourself. The links to create the repositories will be in CoursePlus.
The first one (Part 1) will be a public repository to build a website for an R package. It is public because you will need to deploy the website.
The second one (Part 2) will be a private repository to practice using two different APIs, practice functional programming, and building data analyses.
Part 1
Here, we will practice using pkgdown
. Using any R package with a GitHub repository (that does not already have a pkgdown website), use pkgdown
to create a website for the software package.
Part 1A: Create website locally
Fork the GitHub repository from the original location to your own GitHub account. Clone the repository to your local computer.
Use usethis
and pkgdown
to create a website locally for the R package of your choice.
Part 1B: Customize the website
Here, you need to customize the website in at least 5 ways. How you customize is up to you. The pkgdown
website has lots of suggestions for you to try out!
Part 1C: Create an example data analysis
In this part, you will create a data analysis (or a case study) where you demonstrate the functions in the R package. Specifically, you will add another article or vignette titled “Example analysis” inside the /vignettes
folder.
Similar to Project 2, you must pick out a data set from TidyTuesday that you have not worked with before (i.e. not in a previous project or assignment from this class or from 776, but other classes or personal projects are acceptable). You must also demonstrate wrangling and plotting the data. Finally, your example analysis, must also demonstrate at least 2 functions from the R package in some way in the vignette.
Other requirements for this part of vignette are the following:
- Pick any data set you wish from TidyTuesday to analyze.
- You must describe what is the question you aim to answer with the data and data analysis.
- You must describe and link to where the original data come from that you chose.
- You must include a link to a data dictionary for the data or create one inside the webpage.
- Load the data into R
- In this step, you must test if a directory named
data
exists locally. If it does not, write an R function that creates it programmatically.
- Saves the data only once (not each time you knit/render the document).
- Read in the data locally each time you knit/render.
- Your analysis must include some form of data wrangling and data visualization.
- You must use at least six different functions from
dplyr
,tidyr
,lubridate
,stringr
, orforcats
. - You must use at least two functions from
purrr
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions).- Plots should have titles, subtitles, captions, and human-understandable axis labels.
- At least one plot should using a type of faceting (
facet_grid()
orfacet_wrap()
).
- Apply at least 2 functions from the R package in the vignette.
- Summarize and interpret the results in 1-2 sentences.
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
,ggplot2
, etc) to help the TA with respect to making sure you met all the requirements described above.
Part 1D: Create a README.md
file
If the package does not already include one, create and include a README.md
file in the folder where the R package and pkgdown files are on your computer and add the following information below.
If it already has a README.md
file, just edit the top of the file with the following information:
- Include a URL to the GitHub link to where the original R package came from.
- Include a URL to the deployed website that you will do in Part 1E, but it should be something like
https://jhu-statprogramming-fall-2022.github.io/biostat840-project3-pkgdown-<your_github_username>
. - Include a description of the 5 things you customized in your
pkgdown
website (excluding adding the example data analysis from Part 1C).
The readme must also include (if it does not already):
- The title of package
- The original author of the package (and you who made the website and example data analysis)
- A goal / description of the package
- A list of exported functions that are in the package. Briefly describe each function.
- A basic example with one of the functions.
Part 1E: Deploy the website
The link to create a public GitHub repository for yourself to complete this part of Project 3 will be posted in CoursePlus. This creates an empty GitHub repository.
When ready, deploy the website.
Part 2
Here, we will practice using APIs and making data visualizations.
For this part of Project 3, you need to create a private GitHub repository for yourself, which will be posted in CoursePlus. This creates an empty GitHub repository. You need to show all your code and submit both the .qmd
file and the rendered HTML file.
Part 2A
The first API we will use is tidycensus
(https://walker-data.com/tidycensus), which is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.
The goal of this part is to create a data analysis (or a case study) using the US Census Bureau’s data.
Other requirements for this part are the following:
- You must describe what is the question you aim to answer with the data and data analysis.
- You must use at least three different calls to the
tidycensus
API to extract out different datasets. For example, these could be across years, locations, or variables.
- In this step, you must test if a directory named
data
exists locally. If it does not, write an R function that creates it programmatically.
- Saves the data only once (not each time you knit/render the document).
- Read in the data locally each time you knit/render.
- Your analysis must include some form of data wrangling and data visualization.
- You must use at least six different functions from
dplyr
,tidyr
,lubridate
,stringr
, orforcats
. - You must use at least two functions from
purrr
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions).- Plots should have titles, subtitles, captions, and human-understandable axis labels.
- At least one plot should using a type of faceting (
facet_grid()
orfacet_wrap()
).
- Summarize and interpret the results in 1-2 sentences.
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
,ggplot2
, etc) to help the TA with respect to making sure you met all the requirements described above.
Part 2B
The second API we will use is the Covid Act Now Data API.
The goal of this part is to create a data analysis (or a case study) using the Covid Act Now Data API.
Other requirements for this part are the following:
- You must describe what is the question you aim to answer with the data and data analysis.
- You must use at least three different calls to the Covid Act Now Data API to extract out different datasets. For example, these could be across counties, etc.
- In this step, you must test if a directory named
data
exists locally. If it does not, write an R function that creates it programmatically.
- Saves the data only once (not each time you knit/render the document).
- Read in the data locally each time you knit/render.
- Your analysis must include some form of data wrangling and data visualization.
- You must use at least six different functions from
dplyr
,tidyr
,lubridate
,stringr
, orforcats
. - You must use at least two functions from
purrr
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions).- Plots should have titles, subtitles, captions, and human-understandable axis labels.
- At least one plot should using a type of faceting (
facet_grid()
orfacet_wrap()
).
- Summarize and interpret the results in 1-2 sentences.
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
,ggplot2
, etc) to help the TA with respect to making sure you met all the requirements described above. - Push your code and rendered HTML to the private repository that you created for yourself.