Project 4

Building data analytic products using multiple programming paradigms
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

October 31, 2024

Overview

The purpose of Project 4 is to work together as a team to design and implement a data analytic product (e.g. a data analysis, a software package, a dashboard, simulation tool, etc) that integrates multiple programming paradigms. The project is intentionally open-ended with the goal it will give you and your team the chance to follow your own curiosity.

Deliverables

Under the Schedule on Course Plus, there will be Google Forms posted to submit the various components of the Final Project.

  1. Form a team (Due Friday November 8th at 11:59pm) and submit the names of your team members using a Google Form on CoursePlus. You should aim to have minimum 3, but ideally 4 (max) people on your team. Please identify a team leader who will fill out this form only on behalf of all team members. This data is very important because we will start scheduling meetings with teams for the week of Nov 18-22 based on the information provided by Nov 8th.

  2. Submit a final project proposal (Due Friday November 15th at 11:59pm) using a Google Form on CoursePlus. The team leader who was identified in Part 1 of the Final project should fill out this form only on behalf of all team members. You will need to upload a PDF of your project proposal answering the questions below describing what should be in the project proposal.

  3. Meet with the instructor or TAs the week of November 18-22, 2024. The instructor or TAs will contact teams individuall to schedule a time to meet. The purpose of this meeting is to make sure the project is feasible, not too big or small in scope, data is accessible, etc.

  4. Prepare a final project presentation (Due Wednesday December 11th at 11:59pm) and upload a PDF of the presentation using a Google Form on CoursePlus. Your team also will need to sign up for a date to present as a team on either December 12th or December 17th in class. You are expected to present as a team on that date. The sign up for the schedule will be posted soon. The content of what should be in the presentation is described below.

  5. Submit a final project write up (Due Thursday December 19th at 11:59pm) to GitHub Classroom. You will also submit a link to the GitHub Classroom repository using a Google Form on CoursePlus. Please prepare a write up of your final project and upload the write up. The content of what should be in the write up is described below.

  6. Submit group participation evaluation (Due Thursday December 19th at 11:59pm) using a Google Form on CoursePlus. This will include both a self-evaluation and a peer-evaluation on your team member.

Grading

The breakdown for how the components in the final project will be weighted towards the final project (or Project 4) grade:

  • Submit names of team members (5%)
  • Submit project proposal (20%)
  • Meet with the instructor or TAs to discuss project proposal as a team (15%)
  • Project presentation as a team (25%)
  • Final project write up via GitHub Classroom (25%)
  • Group participation (self- and peer-evaluation) (10%)
Important

Team members will recieve the same grade for everything except the group participation. This will be evaluated separately for each individual depending on their level of participation and contribution to the team. We will ask each of you to self-evaluation your participation and we will ask your team members to evaluate your particiption.

If you all contribute to the project, this should be an easy 10%. If you let your team members do all the work and you don’t contribute, this will reflect in your final project grade.

Final project proposal

Using quarto, write up a project proposal and submit the PDF. The final project proposal should include the following information:

  • The title of your project and the team members names
  • You should describe a research or data analysis question and explain its importance
  • You should summarize work that already exists (if it does)
  • You should outline the work you plan to do
  • You should demonstrate you have access to the data, describe the data, and propose how you will collect the data
  • You should describe the programming paradigms you plan to use and why it makes sense to combine them for your project
  • You should describe any packages and/or software you plan to use
  • You should briefly describe the data analytic product you plan to build
  • You should describe a tentative timeline for the proposal
  • You should describe how the tasks will be split amongst the team members

Data or database collection

Your final project must demonstrate collecting data from a source in a non-trival way. An example of something trival might be to simply read in a CSV file. In class, we talked about extracting data from APIs, HTML, or SQL databases. These are not the only ways to collect data, but the final project should include either one of the following paradigms around data / database collection or the team could propose another (non-trivial) data collection paradigm they would like to propose for the project.

  • Data collection paradigms
  • Database programming paradigms

Programming paradigms

Your final project must use at least two (or more is OK too, but at minimum two) of the following programming paradigms we discussed in class:

  • Programming in the command line (e.g. shell scripting)
  • Functional programming paradigms
  • Object oriented programming paradigms
  • Parallel computing paradigms
  • Machine learning paradigms

Products

Your final project must include building a data analytic products, such as:

  • A data analysis summarized in a deployed quarto website
  • An R package with a deployed pkgdown website
  • A deployed dashboard (e.g. using flexdashboard or shiny)

Example ideas

Here are some example ideas for potential projects:

  • Build a custom SQL database to track outcomes for a particular disease and then build machine learning models to predict disease spread allowing for functional modules for data cleaning.
  • Build an R package (i.e. object oriented programming) around pulling data from the Open Baltimore API, which includes writing functions leveraging functional programming paradigms.
  • Build machine learning models leveraging parallel computing for distributed data (e.g. data not all in one place or data that can’t be read into memory at once).
  • Extract data from an API and build a dashboard to tackle a statistical problem that might leverage functional programming and parallel computing paradigms.
  • Build an R package to help customer retention for a business using shell scripts to run simulations.

Again, the goal is for your team to investigate a question of interest with data of your choice that integrates multiple programming paradigms and finally builds some type of data analytic product. It is intentionally open and broad, but hopefully more fun for your team!

Final project presentation

Depending on how many teams are formed, this will determine how long teams have to present, but given the size of the class, it will likely be around 10-15 mins in length. You will be asked to turn in slides for your presentation before you present.

Give a presentation explaining the importance of the project, an overview of the technical challenges, and what you learned. Plan for a few minutes of questions at the end. Also consider the following:

  • Presentation quality: Is the presentation clear on what was the question being investigated? What were the orginal goals and what was accomplished along the way? Was there previous work in this area? Are the slides readable? Do the figures have large enough legends and figure titles? Did you describe the data?
  • Paradigm integration: How effectively does your project demonstrate multiple paradigms?
  • Functionality: Does the data analytic product work as intended? Are the components well-integrated?
  • Usability and documentation: Is the data analytic product easy to use/read and well-documented?
  • Originality and complexity: Does the project address a non-trivial statistical or programming problem with creativity?

Final project write up

Write up a summary of your final project and submit it to GitHub classroom.

Please explain the importance of the project, give an overview of the technical challenges, and what you learned. Also consider the following:

  • Write up quality: Is the write up clear on what was the question being investigated? What were the orginal goals and what was accomplished along the way? Was there previous work in this area? Do the figures have large enough legends and figure titles? Did you describe the data?
  • Is there a README.md in the repo you push to GitHub classroom summarizing key details about the project, including team members and an overview of the final project?
  • Have you linked to all code and data needed to reproduce your work?
  • Paradigm integration: How effectively does your project demonstrate multiple paradigms?
  • Functionality: Does the data analytic product work as intended? Are the components well-integrated?
  • Usability and documentation: Is the data analytic product easy to use/read and well-documented?
  • Originality and complexity: Does the project address a non-trivial statistical or programming problem with creativity?