Version Control with Git and Collaboration with GitHub

Parks Canada; Ecological Integrity Monitoring Program

Andy Teucher

Schedule

Pacific Time Eastern Time Duration
09:30 - 10:50 12:30 – 13:50 80 min
Break Break 15 min
11:05 – 12:25 14:05– 15:25 80 min
Break Break 15 min
12:40 – 14:00 15:40 – 17:00 80 min

What is Git?

  • A version control system
  • It tracks and manages the evolution of a set of files — called a repository*
  • It keeps a history of changes to files, who made the changes, and when
  • It allows you to revert to previous versions of files
  • It enables collaboration by merging changes from multiple contributors
  • It is a command-line tool, but there are graphical user interfaces (GUIs) available

*repository (repo) == folder == RStudio or Positron project

Using a project-oriented workflow is critical

Why Git?

Why Git?

“Commits” help us focus on the work, not the bookkeeping.

It takes care of otherwise time- consuming versioning & unrelenting file tracking.

Why GitHub?

What is GitHub (traditional answer)?

  • GitHub is an online collaborative platform. It is built on top of Git - software that does file versioning and bookkeeping.
  • It is a remote host for your Git repositories.
  • It provides a web interface for managing, sharing, and collaborating on your repositories.
  • You can think of GitHub working like Dropbox - but with more control.

What is GitHub (non-traditional answer)?

  • Publishing platform. A new way for sharing documentation & scicomm - your files can become websites! Share the url once, it’s always the latest version.

  • Project management system. Short- and long-term collaborative “todo” lists: “Issues” & “Projects”

GitHub account management

  • You can have one GitHub account for all of your work and personal projects
  • Add different email addresses to your GitHub account
    • work email for work projects
    • personal email for personal projects
    • associate activity with the correct email address
  • You can use multiple GitHub accounts, but it is more complicated to set up

Let’s    set up!

An aside: set up your terminal in Positron

  • Unless you are already a Windows Powershell or Command Prompt user, I recommend switching the terminal in Positron to Git Bash.

  • Ctrl-Shift-P to open the command palette
  • search for "terminal default profile"
  • select Terminal: Select Default Profile” and set it to “Git Bash”.

settings.json

{
  "git.path": "C:/Program Files/Git/bin/bash.exe",
  "terminal.integrated.defaultProfile.windows": "Git Bash",
  "workbench.keybindings.rstudioKeybindings": true,
}

Introduce yourself to Git

Configure your user.name and user.email for git. This information will be associated with your commits.


In the terminal

which git
git config --global user.name "Jane Doe"
git config --global user.email "jane@doe.ca"


In Positron or RStudio

usethis::use_git_config(
  user.name = "Jane Doe",
  user.email = "jane@doe.ca"
)

Check your Git setup with a “situation report”

usethis::git_sitrep()
── Git global (user)
• Name: 'Andy Teucher'
• Email: 'andy.teucher@gmail.com'
• Global (user-level) gitignore file: '/Users/andy/.gitignore'
• Vaccinated: TRUE
ℹ Defaulting to 'https' Git protocol
• Default Git protocol: 'https'
• Default initial branch name: 'main'

── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': '<discovered>'
• GitHub user: 'ateucher'
• Token scopes: 'gist, repo, user, workflow'
• Email(s): 'andy.teucher@gmail.com (primary)'
ℹ No active usethis project

Personal access token for 'https://github.com': '<discovered>'

It is essential to check whether your name and email are correct and the PAT is showing as “discovered”

Do you see anything different in your output?

We can interact with Git and GitHub:

  • from with Positron GUI,
  • using usethis git-related functions,
  • command-line tools + GitHub web interface

Starting a new project with Git and GitHub

  • GitHub First
    • Create a new repo on GitHub, then clone it to your computer
    • Preferred, cleanest workflow

  • Local First
    • Create a new repo on your computer, then push it to GitHub
    • Totally fine, a bit more awkward

We will:

  1. Create a new repo on GitHub
  2. Clone it to your computer and open as a new project using Positron
  3. Create some commits
  4. Push the commits to GitHub

We’re going to make a repository for recipes!

Organization:

  • README.md with a description of the project
  • one .md file per recipe in a recipes/ folder

Making a commit

  • Make changes to file(s)
  • Choose what files you want to include in the commit
  • Stage the changes (using git add)
  • Commit the staged changes (using git commit)

Converting an existing project to use Git and GitHub

  1. Initialize a Git repository in your existing project
  2. Commit the existing files
  3. Create a new repo on GitHub
  4. Link your local repo to the GitHub repo
  5. Push the commits to GitHub

Ignoring things

  • What about files that you don’t want to track with Git or put on GitHub?
    • large/sensitive data files,
    • rendered outputs,
    • temporary files created by IDE
  • .gitignore file in the root of your project
# R data files
*.RData
*.rda

# Specific data files
data/large_data.csv

# output files
reports/
  • Each line specifies a pattern
  • Comments starting with # are ignored
  • Wildcards (e.g., *) to specify patterns
  • Specify specific files or entire directories

Create/add to a .gitignore file

Or

usethis::use_git_ignore(
  c("*.RData", "data/large_data.csv", "reports/")
)

Your Turn

  1. Find an existing project on your computer, and turn it into a git repository
  2. Choose carefully* what files to include in your first commit, and commit the files
  3. Create a new repository on GitHub and push your local commits to GitHub
  • Remember to use the .gitignore file

Collaborating

  • Git is a distributed version control system
  • Each collaborator has a full copy of the repository, including its history
  • All collaborators can push changes to the remote repository (given they have permission)
  • Changes made by collaborators can be pulled into your local repository

Your turn

  • In pairs, one person is the “owner” of a repo, the other is a “collaborator”
  • Owner adds collaborator to a on GitHub
  • Collaborator clones the repo to their computer
  • Collaborator makes a change, commits, and pushes to GitHub
  • Owner pulls the change to their computer

Collaborating via branching + “Pull Requests”

  • More formal collaboration workflow
  • Allows fearless experimentation
  • Allows efficient code review and discussion
  • Good for bigger contributions


%%{
    init: {
        'theme':'base', 
        'showCommitLabel': false, 
        'themeVariables': { 
            'git0':'#6cc644', 
            'git1':'#f39c12', 
            'commitLabelColor': '#325c65', 
            'commitLabelBackground':'#325c65'
        }
    }
}%%

gitGraph
    commit id: " "
    commit id: "  "
    commit id: "   "
    commit id: "    "
    branch feature
    commit id: "     "
    commit id: "      "
    commit id: "       "
    commit id: "        "
    checkout main
    merge feature tag: "PR review & merge"
    commit id: "         "
    commit id: "          "

Steps

  1. Create a new branch
  2. Make changes, commit, and push the branch to GitHub
  3. Open a Pull Request (PR) on GitHub
  4. Discuss and review the PR
  5. Merge the PR into the main branch
  6. Pull the latest changes to your local main branch

Licensing and best practices

Share evidence, research and decision making openly. Make all non-sensitive data, information, and new code developed in delivery of services open to the outside world for sharing and reuse under an open licence.

It is recommended that the source code be released as early as possible in the project’s life cycle to avoid the overhead of publishing source code late in the process.

Employees should use their full name and Government of Canada email address for all code contributions to public repositories while acting within the scope of their duties or employment.

Choose a license

For work

By default, a project without an open source licence applied to it would only be released under the Crown Copyright.

  • MIT is a good, permissive license for most projects
  • Apache 2.0 is similar, but adds an explicit patent grant
usethis::use_mit_license(
  copyright_holder = "His Majesty the King in Right of Canada, as represented by the Minister responsible for Parks Canada Agency"
)

For personal projects

usethis::use_mit_license(
  copyright_holder = "Andy Teucher"
)

Your turn

Add a license to your repo via a pull request