Building a R package, and practicing S3 and regex skills
Due date: Oct 22 at 11:59pm
The goal of this homework is to take some of the functions that you wrote in Project 2 and to put them into an R package. This would allow someone else to easily use your functions by simply installing the R package. In addition, they would receive documentation on how to use the functions.
In addition to building the R package, you will also build a S3 class for your package, and create a vignette where you demonstrate the functions in your R package with a small example dataset from TidyTuesday.
Please build your R package either into a .tar.gz
file or to a .zip
file and upload the file to the dropbox on Courseplus. Please name the package Project3<your last name>
and then upload it to the Courseplus dropbox. So for example, the name of my package would be Project3Hicks
.
The R package must include a vignette in a folder titled vignettes
. This document should be a R Markdown. In the vignette, please show all your code (i.e. make sure to set echo = TRUE
).
Before attempting this assignment, you should first install the following packages, if they are not already installed:
install.packages("tidyverse")
install.packages("tidytuesdayR")
install.packages("devtools")
install.packages("roxygen2")
Take the functions that you wrote for Parts 1A-1C and put them into an R package. Your package will have two exported functions for users to call (see below). You will need to write documentation for each function that you export. Your package should include the functions:
Exp()
, which computes the approximation to the exponential function (exported)sample_mean()
, which calculates the sample mean (not exported)sample_sd()
, which calculates the sample standard deviation (not exported)calculate_CI()
, which calculates the confidence intervals from simulated data (exported)Notes:
Remember that you should only export the functions that you want the user to use.
Functions that are not exported do not require any documentation.
Each exported function should have at least one example of its usage (using the @example
directive in the documentation).
In the functions in your package, consider using control structures and include checks (e.g. is.na()
, is.numeric()
, if()
) to make sure the input is as you expect it to be. For example, try to break the the function with unexpected values that a user might provide (e.g. providing a negative value to a log transformation). This can help guide you on ways to address the possible ways to break the function.
Your package should be installable without any warnings or errors.
In this part, you will create a new S3 class called p3_class
(Project 3 class) to be used in your Project 3 R package. You will
p3_class
called make_p3_class()
.print()
method to work with the p3_class
to return a message with name of the class and the the number of observations in the S3 object.calculate_CI()
function to work with the p3_class
and still return a lower_bound
and upper_bound
, similar to Project 2.For example, this is what the output of your code might look like:
> set.seed(1234)
> x <- rnorm(100)
> p3 <- make_p3_class(x)
> print(p3) # explicitly using the print() method
#> a p3_class with 100 observations
> p3 # using autoprinting
#> a p3_class with 100 observations
Calculate a 90% confidence interval:
> calculate_CI(p3, conf = 0.90)
#> lower_bound upper_bound
#> -0.32353231 0.01000883
In this part, you will create a vignette where you demonstrate the functions in your R package. Specifically, you will create a R Markdown and put it in a folder called “vignettes” within your R package. The purpose of a vignette is to demonstrate the functions of your package in a longer tutorial instead of just short examples within the documentation of your functions (i.e. using the @example
directive in the documentation).
Hint: You might find the use_vignette()
function from the usethis
R package helpful.
Exp()
In the vignette, show how your function Exp(x,k)
approximates the exp(x)
function from base R as \(k\) increases.
For example, you could make a plot or you could show something like this in your vignette:
> # Taylor series approximation
> Exp(5,k=1)
[1] 6
> Exp(5,k=5)
[1] 91.41667
> Exp(5,k=10)
[1] 146.3806
> Exp(5,k=100)
[1] 148.4132
>
> # compared to
> exp(5)
[1] 148.4132
calculate_CI()
To demonstrate the calculate_CI()
function in the vignette, we will have a bit of Halloween fun. We will use this dataset from TidyTuesday.
It is contains data from the TV show called Chopped:
“Chopped is an American reality-based cooking television game show series. It is hosted by Ted Allen. The series pits four chefs against each other as they compete for a chance to win $10,000.”
You can read more here about the show:
https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-08-25/readme.md
I have provided the code below for you to avoid re-downloading data:
library(here)
library(tidyverse)
# tests if a directory named "data" exists locally
if(!dir.exists(here("data"))) { dir.create(here("data")) }
# saves data only once (not each time you knit a R Markdown)
if(!file.exists(here("data","chopped.RDS"))) {
url_tsv <- 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-25/chopped.tsv'
chopped <- readr::read_tsv(url_tsv)
# save the file to RDS objects
saveRDS(chopped, file= here("data","chopped.RDS"))
}
Here we read in the .RDS
dataset:
# A tibble: 569 × 21
season season_episode series_episode episode_rating episode_name
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 1 1 9.2 Octopus, Duck,…
2 1 2 2 8.8 Tofu, Blueberr…
3 1 3 3 8.9 Avocado, Tahin…
4 1 4 4 8.5 Banana, Collar…
5 1 5 5 8.8 Yucca, Waterme…
6 1 6 6 8.5 Canned Peaches…
7 1 7 7 8.8 Quail, Arctic …
8 1 8 8 9 Coconut, Calam…
9 1 9 9 8.9 Mac & Cheese, …
10 1 10 10 8.8 String Cheese,…
# … with 559 more rows, and 16 more variables: episode_notes <chr>,
# air_date <chr>, judge1 <chr>, judge2 <chr>, judge3 <chr>,
# appetizer <chr>, entree <chr>, dessert <chr>, contestant1 <chr>,
# contestant1_info <chr>, contestant2 <chr>,
# contestant2_info <chr>, contestant3 <chr>,
# contestant3_info <chr>, contestant4 <chr>, contestant4_info <chr>
This dataset inclues a set of notes (the episode_notes
column) that briefly describe what happened in the episode.
as_tibble(chopped) %>%
select(episode_notes)
# A tibble: 569 × 1
episode_notes
<chr>
1 This is the first episode with only three official ingredients in …
2 This is the first of a few episodes with five official ingredients…
3 <NA>
4 In the appetizer round, Chef Chuboda refused to use bananas in his…
5 <NA>
6 <NA>
7 In the appetizer Melinda did not get her quail on 1 plate. As a re…
8 In the appetizer round, Chef LePape failed to get any food onto hi…
9 Chef Lustberg has also competed on the ninth season of Hell's Kitc…
10 This is the first episode to feature four male chefs.
# … with 559 more rows
As Halloween is coming up at the end of this month, let’s show users of R package, how to create a confidence interval of the episode ratings for the episodes that were Halloween themed vs not. One might guess that Halloween themed episodes are very popular (more so than the not themed episodes – I mean who doesn’t love “blood sausage”, “coffin toast”, “gummy rats”, “deviled eggs”, or “chocolate covered bugs”??).
Our new package can help with this!
In this part, we will perform the following tasks in the vignette to demonstrate to the users the calculate_CI()
function:
NA
in either of the two columns episode_notes
or episode_rating
.has_halloween_theme
that searches the character strings in episode_notes
for the strings “halloween” or “Halloween”. This column should contain either TRUE
or FALSE
depending on whether the strings were found (TRUE
) or not (FALSE
).episode_rating
column: one for the episodes with and without the halloween theme. On top of the boxplot, plot the ratings for the two categories (hint: check out the geom_jitter()
function in ggplot2
).tidyverse
and using our new S3 class (p3_class
), calculate a 90% confidence interval of the ratings for the episodes with and without the Halloween theme.Note: Steps 3 and 4 should be performed in two separate code chunks in the vignette.
Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hicks (2021, Oct. 5). Statistical Computing: Project 3. Retrieved from https://stephaniehicks.com/jhustatcomputing2021/projects/2021-10-05-project-3/
BibTeX citation
@misc{hicks2021project, author = {Hicks, Stephanie}, title = {Statistical Computing: Project 3}, url = {https://stephaniehicks.com/jhustatcomputing2021/projects/2021-10-05-project-3/}, year = {2021} }