Before class
- Setup class organization at Github
- Check the
Allow members to create repositories for this organization
permission- Set the
Default permissions
for the organization toNone
if you want to avoid students accessing each others repositories- Have students create a GitHub account and email their username to the instructor.
- Add students’ username to organization.
For class
- Download Gaeta_etal_CLC_data.csv.
Either arrange to have a teaching partner to attend class or be logged into GitHub as another user in the browser for collaboration demos.
- Open the following links in a browser and zoom in to make the images fill the screen.
Live coding demo and assignment are intertwined and designed to work in order.
Introduction
Motivation
- Who has a directory on their computer with a bunch of filenames
- Get rid of messy folders and track changes to things like data files and code in a more manageable way.
Benefits of version control
- Track changes (but better)
- Tracks every change ever made in groups called commits
- Every commit stores the full state of all of your files at that time
- Never lose anything
- Revert or restore to any commit
- Easily unbreak your code/data/manuscript
- No more file name changes
- Tracks every change ever made in groups called commits
- Collaboration
- Work on things simultaneously
- See what changes others have made
- Everyone has the most recent version of everything
- Work on things simultaneously
Version control using Git & RStudio
Create a Git repo
- Navigate to Github in a web browser and login.
- Click the
+
at the upper right corner of the page and chooseNew repository
. - Choose the class organization (e.g.,
dcsemester
) as theOwner
of the repo. - Fill in a
Repository name
that follows the formFirstnameLastname
. - Select
Private
. - Select
Initialize this repository with a README
. - Click
Create Repository
.
Connect to the Git repo in RStudio
- From new GitHub repository, click green
Clone or download
button -> Click theCopy to clipboard
button. - In RStudio, File -> New Project -> Version Control -> Git
- Paste copied URL in
Repository URL:
. - Leave
Project directory name:
blank; automatically given repo name. - Choose where to
Create project as subdirectory of:
. - Click
Create Project
. - Check to make sure you have a
Git
tab in the upper right window.
Introduce yourself to Git
- Git tab ->
More
->Shell
git config --global user.name "[name]"
git config --global user.email "[email]"
(same as GitHub account email).
That was Exercise 1 - Set-up Git. Have students confirm that this all worked and fix any issues.
First commits
Commit data
- Download the data file Gaeta_etal_CLC_data.csv to your project directory.
- Add the data file to version control
- Two step process:
- Add the data file (checkbox)
- Commit it
- Git -> Select
Gaeta_etal_CLC_data.csv
. - Commit with message.
Add fish size and growth rate data
- History:
- One commit
- Changes too large to see
Commit R script
- Read in data to new R script.
fish_data = read.csv("Gaeta_etal_CLC_data.csv")
- Save as
fish-analysis.R
. - Git -> Select
fish-analysis.R
.- Changes in staged files will be included in next commit.
- Can also see changes by selecting
Diff
- Commit with message.
Start script comparing fish length and scale size
- History:
- Two commits
- See what changes were made to
fish-analysis.R
Building a history
fish-analysis.R
doesn’t currently show on theGit
tab- No saved changes since last commit
- Add some more code to
fish-analysis.R
- Create new categorical size column
library(dplyr)
fish_data_cat = fish_data %>%
mutate(length_cat = ifelse(length > 200, "big", "small"))
- Save
fish-analysis.R
. - Now we see the file on the
Git
tab.M
indicates that it’s been modified.
- To commit these changes, we need to stage the file.
- Check the box next to
fish-analysis.R
.
- Check the box next to
- Commit with message.
Add categorical fish length column
- History:
- Three commits
- Each
fish-analysis.R
commit shows the additions we made in that commit.
- Modify this code in
fish-analysis.R
- Change category cut-off size
fish_data_cat = fish_data %>%
mutate(length_cat = ifelse(length > 300, "big", "small"))
- Save file -> stage -> commit
Change size cutoff for new column
- Green sections for added lines, red for deleted
- Git works line by line.
- The previous version of the line is shown as deleted.
- The new version of the line is shown as added.
Do Exercise 2 - First Solo Commit and Exercise 3 - Second Solo Commit
Instructor also do exercises
Committing multiple files
- Commits can include multiple files at once
- Let’s move our data file into a
data
subdirectory New Folder
->data
- Checkbox
Gaeta_etal_CLC_data.csv
->More
->Move
- Change code to read from new subdirectory
fish_data = read.csv("data/Gaeta_etal_CLC_data.csv")
- Changes to R script indicated by M
- Original datafile has a red D next to it which indicates “deleted”
- New, untracked, data directory
- git initially thinks we’ve deleted
Gaeta_etal_CLC_data.csv
and created a newGaeta_etal_CLC_data.csv
file in a new directory. - Click on both the old and new files to stage them
-
git then recognizes that we have moved (or renamed) the file by making the two files into one and marking this with an
R
for “rename”. - Commit:
Move data file into subdirectory
Instructor also do exercise
Git as a time machine
Experiment with impunity
fish_data_cat = fish_data %>% mutate(length_cat = ifelse(length > 300, "large", "small"))
Save
and show changes are staged-
More
->Revert
->Yes
- Get previous state of a file
History
-> select commit ->View file @ ...
- Save file over current file
- Copy and paste relevant piece into current file
Delete with impunity
- Both of these also work for deleted files
- Close the upper left window with the
fish-analysis.R
. - Choose the
File
tab in the lower right window. - Select
fish-analysis.R
->Delete
->Yes
- Stage deleted file ->
More
->Revert
->Yes
GitHub Remotes
Draw diagram to link local machine with GitHub
origin
.
- So far we’ve worked with a local
Git
repository. - One of the big benefits of version control is easy collaboration.
- To do this, we synchronize our local changes with a remote repository called
origin
. - Our remote repository is on GitHub.
- By far the most popular hosted version control site
- Public and private hosted repositories
- Private free for students and academics
- https://education.github.com/
- For the assignment, we’re using private repositories that we made at the beginning.
Push to a remote
Push
sends your recent commits to theorigin
remote.
Draw push arrow on diagram on board from local to
origin
.
- Before a
Push
your commits show in your local history but not on the remote.
Show local commit history and lack of history in remote.
- To
Push
to your remote, select thePush
button at the top of theGit
tab. - Now your changes and commit history are also stored on the remote.
Show local commits now on
origin
.
Have students email a link to their repo to their instructor once they have finished Pushing Changes
The instructor should then commit the following code to their repo with the commit message:
Plot histogram of scale length by categorical size
ggplot(fish_data_cat, aes(x = scalelength, fill = length_cat)) +
geom_histogram()
Either you (logged in as another user) or your teaching partner should make the same change to your respository
Pulling
- Big advantage to remotes is easy collaboration
- Avoids emailing files and shared folders where you are never sure if you actually have the most recent version
- Makes it easy to see what collaborators have done
- Automatically combines non-overlapping changes
- While I’ve been talking, a collaborator has added a plot of scale size and fish length to the code.
Show
origin
with collaborator commit.
Add collaborator local repo to diagram and
pull
arrow fromorigin
to locals.
Pull
the changes from the remote repo with thePull
button on the Git tab
Show updates to history following
Pull
and run code
Do Tasks 3-6 in Exercise 6 - Pulling and Pushing.
Merges
Demo merges either with a partner or by logging into GitHub as another user in the browser.
- What happens if two people make changes at the same time?
- If they edit different parts of the code git will combine them automatically
- If they edit the same areas of the code this requires human intervention
-
Merges
- You decide to change the number of histogram bins to 10
geom_histogram(bins = 10)
- Your collaborator reassesses the measurement device and decides it is accurate down to 0.5 mm and pushes the change to the remote repository [make this change in the remote]
filter(scalelength >= 0.5)
- You try to push your change
- Get an error that shows someone else has made a change & you need to incorporate it to push
- Pull
- Merge happens automatically
- You have both sets of changes
- Remote still only has collaborators changes
- Push to add the merged version to the remote
Merge conflicts
- If both you and your collaborator edit the same location in the code git doesn’t know how to combine the changes.
-
A human has to make this kind of decision.
- You decide to change
"big"
to"large"
mutate(length_cat = ifelse(length > 300, "large", "small"))
- Your collaborator changes the size threshold and pushes to the remote
mutate(length_cat = ifelse(length > 250, "big", "small"))
- You attempt to push your changes
- Merge conflict when pulling collaborators changes
- This shows as
U
for “unmerged” in RStudio - First block of code is your version
- Second block is the version on the remote
- Combine into a single block that includes everything
mutate(length_cat = ifelse(length > 250, "large", "small"))
- Click check box next to file
- Commit indicating that it is a merge
- Still not on remote yet
- Push
Full GitHub flow
- Collaborating on Github can get more complex with “forks” and “branches.
Optional: Redraw diagram with local,
origin
, andupstream
. Arrows fromorigin
to/fromupstream
are pull requests and merges.
Show an example of a working repository with branches and forks. Navigate to pull requests.