<- function(x, k) {
fn_cos # Add your solution here
}
<- function(x, k) {
fun_sin # Add your solution here
}
Project 2
Background
Due date: November 28 at 11:59pm
The goal of this homework is to write a set of functions and put them into an R package so that other people can easily use the functions in their own data analyses after installing the package. In addition, they would receive documentation on how to use the functions.
In addition to building the R package, you will also build a S3 class for your package, and create a vignette where you demonstrate the functions in your R package with an example dataset from TidyTuesday.
Finally, we will practice our command-line and version control skills by submitting the assignment through GitHub Classroom.
To submit your project
- The link to create a private GitHub repository for yourself to complete Project 2 will be posted in CoursePlus (Note: this creates an empty repository and you need to push your code in your locate remote repository to GitHub when ready).
- Build your R package locally and then push the files to the private Github repository that you created for yourself via GitHub Classroom.
- The TA will grade the R package by cloning the repository, installing it, and checking for all the things described below. It must be installable without any errors.
Part 1: Create an R package
Part 1A: Cosine and sine transformation
The cosine and sine of a number can be written as an infinite series expansion of the form
\[ \cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} \cdots \]
\[ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} \cdots \]
Write two functions that compute the cosine and sine (respectively) of a number using the truncated series expansion. Each function should take two arguments:
x
: the number to be transformedk
: the number of terms to be used in the series expansion beyond the constant 1. The value ofk
is always \(\geq 1\).
- You can assume that the input value
x
will always be a single number. - You can assume that the value
k
will always be an integer \(\geq 1\). - Do not use the
cos()
orsin()
functions in R.
Part 1B: Calculating confidence intervals
Write the following set of functions:
sample_mean()
, which calculates the sample mean
\[ \bar{x} = \frac{1}{N} \sum_{i=1}^n x_i \]
sample_sd()
, which calculates the sample standard deviation
\[ s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2} \]
calculate_CI()
, which calculates the confidence intervals of a sample mean and returns a named vector of length 2, where the first value is thelower_bound
, the second value is theupper_bound
.
\[ \bar{x} \pm t_{\alpha/2, N-1} s_{\bar{x}} \]
- You can assume that the input value
x
will always be a vector of numbers of length N. - Do not use the
mean()
andsd()
functions in R.
<- function(x) {
sample_mean # Add your solution here
}
<- function(x) {
sample_sd # Add your solution here
}
<- function(x, conf = 0.95) {
calculate_CI # Add your solution here
}
Part 1C: Put functions into an R package
Create an R package for the functions you wrote from Part 1A and 1B. Your package will have three exported functions for users to call (see below). You will need to write documentation for each function that you export. Your package should include the functions:
fn_cos()
, which computes the approximation to the cosine function (exported)fn_sin()
, which computes the approximation to the sine function (exported)sample_mean()
, which calculates the sample mean (not exported)sample_sd()
, which calculates the sample standard deviation (not exported)calculate_CI()
, which calculates the confidence intervals from simulated data (exported)
- Remember that you should only export the functions that you want the user to use.
- Functions that are not exported do not require any documentation.
- Each exported function should have at least one example of its usage (using the
@example
directive in the documentation). - In the functions in your package, consider using control structures and include checks (e.g.
is.na()
,is.numeric()
,if()
) to make sure the input is as you expect it to be. For example, try to break the the function with unexpected values that a user might provide (e.g. providing a negative value to a log transformation). This can help guide you on ways to address the possible ways to break the function. - Your package should be installable without any warnings or errors.
Part 2: Create a S3 class as part of your package
In this part, you will create a new S3 class called ci_class
(confidence interval class) to be used in your R package. You will
- Create a constructor function for the
ci_class
calledmake_ci_class()
. - Create a
print()
method to work with theci_class
to return a message with name of the class and the the number of observations in the S3 object. - Re-write the
calculate_CI()
function as a generic method with an implementation for theci_class
and still return alower_bound
andupper_bound
.
For example, this is what the output of your code might look like:
> set.seed(1234)
> x <- rnorm(100)
> obj <- make_ci_class(x)
> print(obj) # explicitly using the print() method
#> a ci_class with 100 observations
> obj # using autoprinting
#> a ci_class with 100 observations
Calculate a 90% confidence interval:
> calculate_CI(obj, conf = 0.90)
#> lower_bound upper_bound
#> -0.32353231 0.01000883
Part 3: Create supporting documents as part of your package
Part 3A: Create a vignette
In this part, you will create a vignette where you demonstrate the functions in your R package. Specifically, you will create a R Markdown and put it in a folder called “vignettes” within your R package. The purpose of a vignette is to demonstrate the functions of your package in a longer tutorial instead of just short examples within the documentation of your functions (i.e. using the @example
directive in the documentation).
You might find the use_vignette()
function from the usethis
R package helpful.
Part 3B: Create a README.md
file
Create a README.md
file in the R package, which will be useful to readers when they learn about your package. The readme must include:
- The title of package
- The author of the package
- A goal / description of the package
- A list of exported functions that are in the package. Briefly describe each function.
- A basic example with one of the functions.
You might find the use_readme_md()
function from the usethis
R package helpful.
Part 3C: Demonstrate fn_cos()
In the vignette, make a plot and show the output of your function fn_cos(x,k)
and how it approximates the cos(x)
function from base R as \(k\) increases.
- The x-axis should range between 0 and 10.
- The y-axis should be the output from
fn_cos(x,k)
orcos(x)
. - Plot the output from
cos(x)
as points on the graph. - Plot the output from
fn_cos(x,k)
as lines on the graph. - Show 5 lines for values
k
= 1, 3, 5, 7, 9. Each line should be a different color.
Part 3D: Demonstrate fn_sin()
Repeat a similar task and make a similar plot as in Part 3C, but here using fn_sin()
instead of fn_cos()
.
Part 3E: Demonstrate calculate_CI()
The goal here is to demonstrate the calculate_CI()
function in your package inside the vignette with some example data from TidyTuesday. However, part of the requirement is to also wrangle and plot the data. At the end of the section, you must demonstrate how to apply calculate_CI()
as an example to the data.
Other requirements for this part of vignette are the following:
- Pick any dataset you wish from TidyTuesday to analyze.
- You must describe what is the question you aim to answer with the data and data analysis.
- You must describe and link to where the original data come from that you chose.
- You must include a link to a data dictionary for the data or create one inside the webpage.
- Load the data into R (you must show the code from this section)
- In this step, you must test if a directory named
data
exists locally. If it does not, write an R function that creates it programmatically.
- Saves the data only once (not each time you knit/render the document).
- Read in the data locally each time you knit/render.
- Your analysis must include some form of data wrangling and data visualization.
- You must use at least eight different functions from
dplyr
,tidyr
,lubridate
,stringr
, orforcats
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions).- Plots should have titles, subtitles, captions, and human-understandable axis labels.
- Apply the function
calculate_CI()
at least once in the vignette.
- Summarize and interpret the results in 1-2 sentences.
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
,ggplot2
, etc) to help the TA with respect to making sure you met all the requirements described above.