# Add your solution here
Project 1
Background
Due date: November 8 at 11:59pm
The goal of this assignment is to practice some of the skills we have been learning about in class around quarto, command-line, and version control by building and deploying a website. You also are asked to practice with some command-line skills more formally.
To submit your project
Please use this Quarto file (.qmd
) and fill in the requested components by adding the URLs pointing to the GitHub repositories and deployed websites. Render this file to a HTML file.
You will push the .qmd
file and rendered HTML file to a private GitHub repostory on GitHub classroom. The link to create a private GitHub repository for yourself to complete Project 1 will be posted in CoursePlus (Note: this creates an empty repository. You will need to clone the reopository locally, add your code, and push your code from your local repo to the remote repository on GitHub Classroom when ready). Please show all your code, if relevant to a section.
The TAs will grade the contents in the GitHub repo by cloning the repo and checking for all the things described below.
Part 1
- Read this blogpost titled Building a brand as a scientist.
- Reflect on the questions in the “Defining your brand” section.
- Write two paragraphs (4-6 sentences) max here answering one (or more) of the questions asked in the section above.
Part 2
Next, with the reflections from Part 1 in mind, you will create a public GitHub repository on your own GitHub account and build a small website to introduce yourself to others in the course. You will also create a small data analysis on one of the webpages to practice literate programming in Quarto.
1. Create a GitHub repo for your website
Create a new public GitHub repository titled biostat777-intro-<firstname>-<lastname>
(where you replace <firstname>
with your first name and <lastname>
with your last name) in your own personal GitHub account (e.g. https://github.com/<yourgithubusername>/biostat777-intro-<firstname>-<lastname>
).
2. Build a website using Quarto
Create a new project locally within RStudio or VSCode and build a website for yourself. Your website should include the following:
- A home/landing page. This is home page that someone will land on your website. At minimum it should include your name, a short summary about yourself (max 2-3 sentences), and a picture of something you enjoy to do for fun (or a picture of yourself if you are comfortable sharing one).
- A page titled ‘About’. This page should describe who you are in greater detail. It could include your professional interests and your educational and/or professional background and/or experience. It could also include any personal information you feel conformable sharing on the website.
- A data analysis page called ‘Example analysis’. You can pick any dataset you wish you analyze. In this webpage, you will analyze a dataset and summarize the results. The requirements for this webpage are the following:
- You must describe what is the question you aim to answer with the data and data analysis.
- You must describe who is the intended audience for the data analysis.
- You must describe and link to where the original data come from that you chose.
- You must include a link to a data dictionary for the data or create one inside the webpage.
- Your analysis must include some minimal form of data wrangling with you using at least five different functions from
dplyr
ortidyr
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions). - Plots should have titles, subtitles, captions, and human-understandable axis labels.
- At least one plot should using a type of faceting (
facet_grid()
orfacet_wrap()
). - Your analysis must include one image or table (not one you created yourself, but one you have saved locally or one from the web).
- Your analysis must include at least two different callout blocks.
- Your analysis must include a
.bib
file, which you use to reference at least three unique citations. For example, it could be to a website or paper from where the original data came from or it could be to a paper describing a method you are using to analyze the data. - Your analysis must include the use of at least 1 margin content.
- You must summarize your analysis and/or results with a paragraph (4-6 sentences).
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
, andggplot2
) to help the TA with respect to making sure you met all the requirements described above.
3. Include a README.md
file
Your local repository should include a README.md
file describing who is the author of the website and a link to the website after it has been deployed. Other things you might include are the technical details for how the website was created and/or deployed.
4. Deploy your website
Deploy your website e.g. using Quarto Pub, GitHub pages, or Netlify, etc. (Note: Deploying your website to RPubs will not be accepted).
Part 3
Consider the dataset called students.csv
with the following contents:
ID,Name,Age,Gender,Grade,Subject
1,Alice,20,F,88,Math
2,Bob,22,M,76,History
3,Charlie,23,M,90,Math
4,Diana,21,F,85,Science
5,Eve,20,F,92,Math
6,Frank,22,M,72,History
7,Grace,23,F,78,Science
8,Heidi,21,F,88,Math
9,Ivan,20,M,85,Science
10,Judy,22,F,79,History
- Display the contents of the
students.csv
file using thecat
command.
# Add your solution here
- Display only the first 5 lines of the file using
head
to show the first few lines.
# Add your solution here
- Display only the last 3 lines of the file using
tail
to show the last few lines.
# Add your solution here
- Count the number of lines in the file using the
wc
command to count the lines.
# Add your solution here
- Find all students who are taking “Math” as a subject using
grep
to filter lines with the subject “Math”.
# Add your solution here
- Find all female students using
grep
orawk
to filter based on gender.
# Add your solution here
- Sort the file by the students’ ages in ascending order. Use
sort
to order by the “Age” column.
# Add your solution here
- Find the unique subjects listed in the file. Use
cut
,sort
, anduniq
to extract unique subjects.
# Add your solution here
- Calculate the average grade of the students. Use
awk
to sum the grades and divide by the number of students.
# Add your solution here
- Replace all occurrences of “Math” with “Mathematics” in the file. Use
sed
to perform the replacement.
# Add your solution here