Project 1

Building a website and practicing with command-line tools
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

October 24, 2024

Background

Due date: November 8 at 11:59pm

The goal of this assignment is to practice some of the skills we have been learning about in class around quarto, command-line, and version control by building and deploying a website. You also are asked to practice with some command-line skills more formally.

To submit your project

Please use this Quarto file (.qmd) and fill in the requested components by adding the URLs pointing to the GitHub repositories and deployed websites. Render this file to a HTML file.

You will push the .qmd file and rendered HTML file to a private GitHub repostory on GitHub classroom. The link to create a private GitHub repository for yourself to complete Project 1 will be posted in CoursePlus (Note: this creates an empty repository. You will need to clone the reopository locally, add your code, and push your code from your local repo to the remote repository on GitHub Classroom when ready). Please show all your code, if relevant to a section.

The TAs will grade the contents in the GitHub repo by cloning the repo and checking for all the things described below.

Part 1

  1. Read this blogpost titled Building a brand as a scientist.
  2. Reflect on the questions in the “Defining your brand” section.
  3. Write two paragraphs (4-6 sentences) max here answering one (or more) of the questions asked in the section above.

Part 2

Next, with the reflections from Part 1 in mind, you will create a public GitHub repository on your own GitHub account and build a small website to introduce yourself to others in the course. You will also create a small data analysis on one of the webpages to practice literate programming in Quarto.

1. Create a GitHub repo for your website

Create a new public GitHub repository titled biostat777-intro-<firstname>-<lastname> (where you replace <firstname> with your first name and <lastname> with your last name) in your own personal GitHub account (e.g. https://github.com/<yourgithubusername>/biostat777-intro-<firstname>-<lastname>).

2. Build a website using Quarto

Create a new project locally within RStudio or VSCode and build a website for yourself. Your website should include the following:

  1. A home/landing page. This is home page that someone will land on your website. At minimum it should include your name, a short summary about yourself (max 2-3 sentences), and a picture of something you enjoy to do for fun (or a picture of yourself if you are comfortable sharing one).
  2. A page titled ‘About’. This page should describe who you are in greater detail. It could include your professional interests and your educational and/or professional background and/or experience. It could also include any personal information you feel conformable sharing on the website.
  3. A data analysis page called ‘Example analysis’. You can pick any dataset you wish you analyze. In this webpage, you will analyze a dataset and summarize the results. The requirements for this webpage are the following:
    • You must describe what is the question you aim to answer with the data and data analysis.
    • You must describe who is the intended audience for the data analysis.
    • You must describe and link to where the original data come from that you chose.
    • You must include a link to a data dictionary for the data or create one inside the webpage.
    • Your analysis must include some minimal form of data wrangling with you using at least five different functions from dplyr or tidyr.
    • Your analysis should include at least three plots with you using at least three different geom_*() functions from ggplot2 (or another package with geom_*() functions).
    • Plots should have titles, subtitles, captions, and human-understandable axis labels.
    • At least one plot should using a type of faceting (facet_grid() or facet_wrap()).
    • Your analysis must include one image or table (not one you created yourself, but one you have saved locally or one from the web).
    • Your analysis must include at least two different callout blocks.
    • Your analysis must include a .bib file, which you use to reference at least three unique citations. For example, it could be to a website or paper from where the original data came from or it could be to a paper describing a method you are using to analyze the data.
    • Your analysis must include the use of at least 1 margin content.
    • You must summarize your analysis and/or results with a paragraph (4-6 sentences).
    • At the end of the data analysis, list out each of the functions you used from each of the packages (dplyr, tidyr, and ggplot2) to help the TA with respect to making sure you met all the requirements described above.

3. Include a README.md file

Your local repository should include a README.md file describing who is the author of the website and a link to the website after it has been deployed. Other things you might include are the technical details for how the website was created and/or deployed.

4. Deploy your website

Deploy your website e.g. using Quarto Pub, GitHub pages, or Netlify, etc. (Note: Deploying your website to RPubs will not be accepted).

5. Share your website

Go to the Discussion Board in CoursePlus and write a short post with a link (URL) to your website (and URL to the corresponding GitHub repository) that you created. Also, list the URLs below for the purposes of grading.

As you read the introductions from other folks in the class, feel free to comment/reply using Discussion board.

  • Link to your GitHub repository: [Delete this text and replace the text with the link to the public GitHub repo you created above for your website]

  • Link to your deployed website: [Delete this and replace the text with the link to the public deployed website you created above]

Part 3

Consider the dataset called students.csv with the following contents:

ID,Name,Age,Gender,Grade,Subject
1,Alice,20,F,88,Math
2,Bob,22,M,76,History
3,Charlie,23,M,90,Math
4,Diana,21,F,85,Science
5,Eve,20,F,92,Math
6,Frank,22,M,72,History
7,Grace,23,F,78,Science
8,Heidi,21,F,88,Math
9,Ivan,20,M,85,Science
10,Judy,22,F,79,History
  1. Use wget to download the students.csv file locally from here.
# Add your solution here
  1. Display the contents of the students.csv file using the cat command.
# Add your solution here
  1. Display only the first 5 lines of the file using head to show the first few lines.
# Add your solution here
  1. Display only the last 3 lines of the file using tail to show the last few lines.
# Add your solution here
  1. Count the number of lines in the file using the wc command to count the lines.
# Add your solution here
  1. Find all students who are taking “Math” as a subject using grep to filter lines with the subject “Math”.
# Add your solution here
  1. Find all female students using grep or awk to filter based on gender.
# Add your solution here
  1. Sort the file by the students’ ages in ascending order. Use sort to order by the “Age” column.
# Add your solution here
  1. Find the unique subjects listed in the file. Use cut, sort, and uniq to extract unique subjects.
# Add your solution here
  1. Calculate the average grade of the students. Use awk to sum the grades and divide by the number of students.
# Add your solution here
  1. Replace all occurrences of “Math” with “Mathematics” in the file. Use sed to perform the replacement.
# Add your solution here