# Add your solution here
Project 1
Background
Due date: November 10 at 11:59pm
The goal of this assignment is to practice some of the skills we have been learning about in class around Quarto, command-line, and version control by building and deploying a website. You also are asked to practice with some command-line skills more formally.
To submit your project
Please use this Quarto file (.qmd
) and fill in the requested components by adding the URLs pointing to the private and public repositories and deployed websites. Render this file to a HTML file and submit your HTML file to the dropbox on CoursePlus. Please show all your code, if relevant to a section.
Part 1
Complete the Git & GitHub Fundamentals Starter course. The link to create a private GitHub repository for yourself to complete the course will be posted in CoursePlus. When you are done, add the link to the GitHub repo here:
- Link to your GitHub repository: [Delete this text and replace the text with the link to the private GitHub repo you created above]
Part 2
- Read this blogpost titled Building a brand as a scientist.
- Reflect on the questions in the “Defining your brand” section.
- Write two paragraphs (4-6 sentences) max here answering one (or more) of the questions asked in the section above.
Part 3
Next, with the reflections from Part 2 in mind, you will create a public GitHub repository on your own GitHub account and build a small website to introduce yourself to others in the course. You will also create a small data analysis on one of the webpages to practice literate programming in Quarto.
1. Create a GitHub repo for your website
Create a new public GitHub repository titled biostat777-intro-<firstname>-<lastname>
(where you replace <firstname>
with your first name and <lastname>
with your last name) in your own personal GitHub account (e.g. https://github.com/<yourgithubusername>/biostat777-intro-<firstname>-<lastname>
).
2. Build a website using Quarto
Create a new project locally within RStudio and build a website for yourself. Your website should include the following:
- A home/landing page. This is home page that someone will land on your website. At minimum it should include your name, a short summary about yourself (max 2-3 sentences), and a picture of something you enjoy to do for fun (or a picture of yourself if you are comfortable sharing one).
- A page titled ‘About’. This page should describe who you are in greater detail. It could include your professional interests and your educational and/or professional background and/or experience. It could also include any personal information you feel conformable sharing on the website.
- A data analysis page called ‘Example analysis’. You can pick any dataset you wish you analyze. In this webpage, you will analyze a dataset and summarize the results. The requirements for this webpage are the following:
- You must describe what is the question you aim to answer with the data and data analysis.
- You must describe who is the intended audience for the data analysis.
- You must describe and link to where the original data come from that you chose.
- You must include a link to a data dictionary for the data or create one inside the webpage.
- Your analysis must include some minimal form of data wrangling with you using at least five different functions from
dplyr
ortidyr
. - Your analysis should include at least three plots with you using at least three different
geom_*()
functions fromggplot2
(or another package withgeom_*()
functions). - Plots should have titles, subtitles, captions, and human-understandable axis labels.
- At least one plot should using a type of faceting (
facet_grid()
orfacet_wrap()
). - Your analysis must include one image or table (not one you created yourself, but one you have saved locally or one from the web).
- Your analysis must include at least two different callout blocks.
- Your analysis must include a
.bib
file, which you use to reference at least three unique citations. For example, it could be to a website or paper from where the original data came from or it could be to a paper describing a method you are using to analyze the data. - Your analysis must include the use of at least 1 margin content.
- You must summarize your analysis and/or results with a paragraph (4-6 sentences).
- At the end of the data analysis, list out each of the functions you used from each of the packages (
dplyr
,tidyr
, andggplot2
) to help the TA with respect to making sure you met all the requirements described above.
3. Include a README.md
file
Your local repository should include a README.md
file describing who is the author of the website and a link to the website after it has been deployed. Other things you might include are the technical details for how the website was created and/or deployed.
4. Deploy your website
Deploy your website using Quarto Pub, GitHub pages, or Netlify. (Note: Deploying your website to RPubs will not be accepted).
Part 4
- Use
wget
to download four files that end in.fastq
from here. - Create a directory to download the data. The top level directory should be called
raw_data
and there should be a sub-level directory calledfastq
. The command you write should force the creation of both directories at the same time if either of them do not exist yet. - Move all the
.fastq
files into thefastq
sub-level directory. - List all the
.fastq
files that end in a 12 or 13. - Search for the string
NNNN
in theSRR1039512_subset_1.fastq
file. In addition to returning the matched line in the.fastq
file, your output should also return the two lines before and the four lines after the matching line.
- Write a for loop in the shell that iterates over each
.fastq
file. For each.fastq
file, do the following. In the first 1000 rows for each file, count the number of lines where the “@” symbol appears. Your final output should be four numbers printed to the screen.