Introduction to version control

Introduction to version control
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

October 31, 2024

Pre-lecture activities

Important

In advance of class, please follow the instructions on Software Carpentry: Version Control with Git to

  1. Installing Git
  2. Creating a GitHub Account
  3. Preparing Your Working Directory

In addition, please read through Chapters 1-9.

How much should I prepare for before class?

You should be comfortable with the meaning of version control, have git installed, create a GitHub account (if you have not before), prepare your working directory, understand the meaning of most the git commands, and also executing commands on your computer before class starts.

During class, I will give an overview of these topics and then we will practice using version control in groups of two with an in-class activity. I will walk around the class answering questions and helping to address questions as they arise in practice. If you have not installed git, created a GitHub account, and prepared your working directory, it will be challenging to participate in the activity.

Lecture

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Understand the benefits of an automated version control system.
  • Understand how to set up git.
  • Understand how to set up a git repository.
  • Understand how to track changes, explore history, and ignore files in a git
  • Understand git remotes.
  • Understand how to use GitHub.
  • Understand collaborating.

Slides

Class activity

For the rest of the time in class, we will practice using version control to collaborate with other people.

Objectives of the activity
  • Clone a remote repository using git clone.
  • Collaborate by using git add, git commit, git push, and git pull to a common repository.
  • Describe the basic collaborative workflow.

For this in-class activity, you need to find a partner. One person will be the “Owner” of a GitHub repository and the other will be the “Collaborator” on the GitHub repository.

The goal of the activity is that the Collaborator and Owner will alternate adding content to a GitHub repository working on collaboratively on a project. We will switch roles at the end, so both people will play Owner and Collaborator.

Part 1: Repository Setup

  • Create a new repository. One partner (the “Owner” of the GitHub repo) should create a GitHub repository titled bikelanes in their own personal GitHub repository.
  • Invite a collaborator. The repository owner needs to invite their partner as a collaborator by going to Settings > Collaborators > Manage Access. Specifically, on GitHub, click the “Settings” button on the right, select “Collaborators”, click “Add people”, and then enter your partner’s GitHub username.

To accept access to the Owner’s repo, the Collaborator needs to go to https://github.com/notifications or check for email notification. Once there you can accept access to the Owner’s repo.

  • Clone the repository. Both the Owner and Collaborator should clone the repository locally to their respective computers.

If using SSH:

git clone git@github.com:<change-to-owners-github-username>/bikelanes.git

If using HTTPS:

git clone https://github.com/<change-to-owners-github-username>/bikelanes.git

Part 2: Adding content to the repository

  • Add a CSV file to the local repository. Consider the dataset called bike_lanes_data.csv with the following contents:
Lane_ID,Street_Name,Length_km,Type,Year_Installed,City
1,Main St,1.2,Protected,2018,Springfield
2,Elm St,0.8,Shared,2016,Springfield
3,Maple Ave,2.3,Buffered,2020,Springfield
4,Oak St,1.5,Protected,2019,Springfield
5,Pine St,0.5,Shared,2015,Springfield
6,River Rd,3.1,Buffered,2021,Springfield

The Owner should create a new CSV locally titled bike_lanes_data.csv and paste the data into the new CSV file. Save the file and commit the file to the local repository using git commands.

cd bikelanes
touch bike_lanes_data.csv # (and paste in the content above)
git status
git add bike_lanes_data.csv
git commit -m "adding bike lanes data"
git push origin main
  • Collaborator needs to pull the new changes locally to their computer. Next, the collaborator should pull the changes from the remote bikelanes GitHub repository to their local computer. This should make the new bike_lanes_data.csv appear locally on their computer.
cd bikelanes
git pull origin main
  • Add a README.md file. The collaborator should create a new file called README.md in the bikelanes repository. Please add the following information to it:
  1. The names of the owner and collaborator of the project
  2. The emails of the owner and collaborator of the project
  3. Answers to the following questions:
    • Do you enjoy biking?
    • What type of biking (e.g. road biking, mountain biking, no biking, etc ) do you enjoy?
    • If not biking, what hobbies do you enjoy doing?

After the README.md file is complete, add and commit the file to the local repository using git commands. Push the changes to the remote bikelanes repository on GitHub.

touch README.md # (add the content)
git status
git add README.md
git commit -m "adding readme file"
git push
  • Owner needs to pull the new changes locally to their computer. Next, the owner of the repository should pull the changes from the remote bikelanes GitHub repository to their local computer. This should make the new REAMDE.md appear locally on their computer.
cd bikelanes
git pull

Part 3: Create GitHub issues to assign tasks

The partners should create at least one Github Issue for a set of future tasks for the project (e.g., “Add bike lane data from our local area” or “Write code to analyze bike lane usage trends”).

Assign these issues to each other and set a due date, if desired.

Part 4: Review the git history

Both partners should review the commit history using git commands

Expected commands: git log

Part 5: Switch roles and repeat process

Switch roles and repeat the whole process.

Reflection Questions

  • What challenges did you encounter when collaborating on GitHub?
  • How did using Issues help you coordinate work?
  • In what ways could GitHub features be useful for larger projects, such as full bike lane data studies?

Post-lecture

A short example of a typical workflow in an order that will minimize merge conflicts:

  • Update local repo: git pull origin main
  • Make changes: e.g. echo 100 >> numbers.txt
  • Stage changes: git add numbers.txt
  • Commit changes: git commit -m "Add 100 to numbers.txt"
  • Update remote: git push origin main

Thre are other resources available on the Software Carpentry’s website in particularly around:

  • The importance of how version control can help your work more open and support Open Science. Open scientific work is more useful and more highly cited than closed.
  • What type of licensing should you consider to include with your work on Licensing.
  • How to make your work easier to cite with the less on Citations.

Additional practice

Here are some additional practice questions to help you think about the material discussed.

Questions
  • Initialize a new Git repository.
    1. Create a new folder on your computer.
    2. Use the git init command to turn it into a Git repository.
    3. Configure your name and email using git config –global user.name “Your Name”andgit config –global user.email “you@example.com”`.

Goal: Learn to set up Git for the first time and understand how Git tracks your identity.

  • Create a new file and commit it.
    1. Create a simple text file in your repository (e.g., README.md).
    2. Use git add <filename> to stage the file.
    3. Use git commit -m "Initial commit" to commit it to the repository.

Goal: Understand how to add and commit files in Git.

  • Modify a file and create a new commit.
    1. Make changes to the file (e.g., add some text to the README.md).
    2. Stage the changes and commit them using git add and git commit.
    3. Use git log to see the history of your commits.

Goal: Learn how to track changes over time and inspect the commit history.

  • Work with branches.
    1. Create a new branch using git checkout -b feature-branch.
    2. Make changes to a file, stage, and commit them.
    3. Switch back to the main branch (git checkout main).
    4. Merge your changes from feature-branch into main using git merge feature-branch.

Goal: Understand how branching works and how to merge changes from different branches.

  • Simulate and resolve a merge conflict.
    1. On the main branch, edit a line in a file and commit the change.
    2. Switch to a new branch (git checkout -b conflicting-branch), edit the same line differently, and commit it.
    3. Merge the conflicting-branch back into main and resolve the merge conflict manually.
    4. Use git status to see the files with conflicts and edit them to resolve.
    5. Commit the resolved changes.

Goal: Learn how to handle merge conflicts and understand conflict markers.