Version control
When carrying out research it can be beneficial to be able to:
- examine the entire history of the project.
- rewind and retrieve a file from the past.
- combine changes from two different parallel pieces of work.
The potential for human error when doing these things can be handled by software. The particular tool we will use is `git.
Setting up git
Before we get started there are a couple of things we need to do to set up
git
. Recall that git
keeps track of the entire history of a project, this
does not only mean keeping track of what was done but also who did it (this is
particularly important in collaborative work).
We start by telling git who we are. Open your command line and type:
git config --global user.name "Your Name"
git config --global user.email "Your Email"
Note this is not data that is being collected by any cloud service or similar. It just stays with your project.
Windows
Note that all these commands work on the anaconda prompt but if you want to use
tab completion you can use the git bash
command line specifically for git.
Initialising a git repository
Let us start by creating a new directory: rsd-checklist
.
Inside that let us create a main.tex
document which we will use to create a
checklist of things learned on this course.
\documentclass{article}
\title{Research software development checklist}
\begin{document}
\maketitle
\end{document}
Now let us tell git
to start keeping an eye on this repository. In the command
line:
git init
You should then see a message saying that you have successfully initialised a git repository.
Staging and committing changes
Let us see what is the status of our repository:
git status
We should see something like:
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
main.tex
nothing added to commit but untracked files present (use "git add" to track)
There are various pieces of useful information here, first of all main.tex
is not currently a tracked file.
We are now going to track that file:
git add main.tex
If we run git status
again we see:
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: main.tex
So the changes we have made to main.tex
(ie starting the file) are ready to
be "committed".
git commit
When doing this, a text editor should open up prompting you to write what is called a commit message. Your machine will probably have one of the following command line editors set up as a default:
For the purposes of using git
these are more than sufficient, all you need to
know how to do is:
- Write (in Nano: just type, in Vim: press
i
and type); - Save (in Nano:
Ctrl + O
, in Vim: pressesc
, then:
, thenw + Enter
); - Quit (in Nano:
Ctrl + X
, in Vim: pressesc
, then:
, thenq + Enter
).
Note it is possible to set up a different default editor but instructions for this can be machine specific.
We are now ready to type our first commit message:
Write blank checklist
This file is currently empty but will include the various aspects of sustainable
software development learnt on the 2 day rsd workshop.
Once you have written that, save and exit from your editor. git
should confirm
that you have successfully made your first commit.
[master (root-commit) 3c1e5ad] write blank checklist
1 file changed, 7 insertions(+)
create mode 100644 main.tex
Now if we run git status
, we see a message saying that everything in our
repository is tracked and up to date:
On branch master
nothing to commit, working directory clean
Note A commit message is made up of 2 main components:
<Title of the commit>
<Description of what was done>
- The title should be a description in the form of "if this commit is applied
<title of the commit>
will happen". The convention is for this to be rather short and to the point. - The description can be as long as needed and should be a helpful explanation of what is happening.
A commit is a snapshot that git
makes of your project, you should use this at
meaningful steps of the progress of a project.
Tracking changes to files
Let us add our name to the checklist document:
\documentclass{article}
\title{Research software development checklist}
\author{Grace Hopper}
\begin{document}
\maketitle
\end{document}
Save your file and then run git status
. We now see that git
is aware of a
change to our file:
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: main.tex
no changes added to commit (use "git add" and/or "git commit -a")
To "stage" the file for a commit we use git add
again:
git add main.tex
Now let us commit:
git commit
With the following commit message:
add author name
In this particular case there is not much more needed than the title.
Finally, we can check the status: git status
to confirm that everything has
been done correctly.
Ignoring files
Let us compile our LaTeX document:
pdflatex main.tex
Now if we check the status of our repository:
git status
We see that all the LaTeX auxiliary files are listed along with the pdf:
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
main.aux
main.log
main.pdf
nothing added to commit but untracked files present (use "git add" to track)
To tell git
to ignore these files (we only need the tex
file) we will add them to a blank file entitled .gitignore
. Open your editor and type
main.aux
main.log
main.pdf
Save that file as .gitignore
and then run git status
again. We see now that
git
is ignoring those 3 files but is aware of the .gitignore
file. Let us
add and commit that file:
git add .gigignore
git commit
Use add .gitignore
as the commit message.
At a later stage we can always modify this
.gitignore
file (perhaps we might choose to tell git
to track the pdf):
Exploring and using history
We have done a good job of keep track of the history of our project but let us see how that can be useful.
First, let us confirm our repository is in the expected state with git status
:
On branch master
nothing to commit, working directory clean
Now, let us look at our history:
git log
This displays the full log of the project:
commit b116460039bb9ed1f79a980a7a71b5b75f0618b4
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:40:47 2017 +0000
add .gitignore
commit 84c678c01ef099aeedc46c922feb38557cffd5c7
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:34:43 2017 +0000
add author name
commit 3c1e5adc13069023d25a76570f092135d0eeb68c
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 10:59:04 2017 +0000
write blank checklist
This file is currently empty but will include the various aspects of
sustainable
software development learnt on the 2 day rsd workshop.
We see that there are 3 commits there, each with a seemingly random set of numbers and characters. This set of characters is called a "hash":
The first commit with title write blank checklist
has hash:
3c1e5adc13069023d25a76570f092135d0eeb68c
. Note that on your machines this
hash will be different, in fact every hash is mathematically guaranteed to be
unique, thus it is uniquely assigned to the changes made.
One of the things that this hash allows us to do is to go back in time. Let us
go back to the main.tex
file before we added the author name. We do this with
the checkout
command and make use of the hash (simply copy it or type the
first few characters). In the case of the example here, using the hash:
git checkout 3c1e5adc13069023d25 main.tex
If we now run git status
we see that the file has changed:
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: main.tex
(Open main.tex
and see that the author name is no longer there.)
We could at this stage make more changes to main.tex
and commit them as before.
Or in this particular case we might want to return the file to the state it was.
We can use HEAD
which is shorthand for the hash corresponding to the last
commit:
git checkout HEAD main.tex
If you run git status
you will see that everything is back to how it was with
the author name.
Creating branches
At the start of this chapter we discussed the ability to work in parallel thanks
to git
. This is done using something called "branches".
When typing git status
we have seen that one piece of information regularly
given was On branch master
. This is telling us which branch of "history" we
are currently on. To view all branches:
git branch
This shows:
* master
So currently there is only one branch. Let us create a new branch called
add-list-of-topics
:
git branch add-list-of-topics
When we now type git branch
we see that 2 branches exist but the active branch
is indicated by *
:
add-list-of-topics
* master
Let us now move to this new branch:
git checkout add-list-of-topics
Run git branch
and then git status
to see how this has worked.
Let us now create a tex
folder and in that folder let us write a
list-of-topics.tex
document:
\begin{itemize}
\item Using the command line to be able to control my machine. \checkmark
\item Use basic Python. \checkmark
\item Write modular code. \checkmark
\item Write documented code. \checkmark
\item Write automatde tested. \checkmark
\item Write modular LaTeX. \checkmark
\item Include outputs of software directly in LaTeX. \checkmark
\item Understanding of git.
\item Use github to collaboratively work on projects.
\end{itemize}
Let us add and commit this file:
git add tex/list-of-topics.tex
git commit
Let us now return to the master
branch, we will come back to this list
shortly.
git checkout master
Let us create a new branch called change-page-width
. We'll use this branch to
modify the page width:
git branch change-page-width
git checkout change-page-width
Let us modify main.tex
to include the following in the preamble:
\documentclass{article}
\usepackage[margin=1.5cm, includefoot, footskip=30pt]{geometry}
\title{Research software development checklist}
\begin{document}
\maketitle
\end{document}
If you compile the document you should see it occupies the page better (there's not much text yet to be able to compare but you get the idea).
Stage and commit this change and take a look at the log:
commit f252f5a6fcb79fb19c0bc8e7262a265da560b800
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 12:16:45 2017 +0000
change page width
commit b116460039bb9ed1f79a980a7a71b5b75f0618b4
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:40:47 2017 +0000
add .gitignore
commit 84c678c01ef099aeedc46c922feb38557cffd5c7
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:34:43 2017 +0000
add author name
commit 3c1e5adc13069023d25a76570f092135d0eeb68c
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 10:59:04 2017 +0000
write blank checklist
This file is currently empty but will include the various aspects of
sustainable
software development learnt on the 2 day rsd workshop.
We see that the commit made on the add-list-of-topics
branch does not appear
here. This is the behaviour of git log
it only shows the history of the
current branch.
Let us return to the master
branch and see how to combine the work that has
been done on each of these branches:
git checkout master
Merging branches
Getting the work from one branch on to another branch is called "merging". Let
us merge the change-page-width
branch in to the master
branch (which is our
current branch).
Note before doing a merge it is always a good idea to check git status
to
ensure everything is in the state we required (e.g. that we are on the correct
branch etc...).
git merge change-page-width
We should see something like:
Updating b116460..f252f5a
Fast-forward
main.tex | 2 ++
1 file changed, 2 insertions(+)
Recall, if you forget the name of branches you can always run git
branch
to see all the branches.
Let us also merge our add-list-of-topics
branch:
git merge add-list-of-topics
When doing merges sometimes git
will open the editor asking for you to confirm
the merge commit message. This is because there are various algorithms git
uses to be able to merge and some require making a snapshot.
If this happens during a merge, do not feel that you have to modify the commit message (unless you want to!).
Merge made by the 'recursive' strategy.
tex/list-of-topics.tex | 11 +++++++++++
1 file changed, 11 insertions(+)
create mode 100644 tex/list-of-topics.tex
Your directory should now look like this:
|---rsd-checklist
|--- main.tex
|--- main.aux
|--- main.log
|--- main.pdf
|--- tex/
|--- list-of-topics.tex
We will make one final modification to main.tex
to include the
list-of-topics.tex
file:
\documentclass{article}
\usepackage[margin=1.5cm, includefoot, footskip=30pt]{geometry}
\usepackage{amssymb} % for the checkmark
\title{Research software development checklist}
\author{Grace Hopper}
\begin{document}
\maketitle
\input{tex/list-of-topics.tex}
\end{document}
Compile your document and take a look at your checklist!
Before we commit this final change let us take a look at one final command:
git diff
This displays the difference between the current state of our repository and the last commit:
diff --git a/main.tex b/main.tex
index bc59d80..4f26c4f 100644
--- a/main.tex
+++ b/main.tex
@@ -1,10 +1,13 @@
\documentclass{article}
\usepackage[margin=1.5cm, includefoot, footskip=30pt]{geometry}
+\usepackage{amssymb} % for the checkmark
\title{Research software development checklist}
\author{Grace Hopper}
\begin{document}
\maketitle
+
+\input{tex/list-of-topics.tex}
\end{document}
(END)
We have added \usepackage{amssymb} % for the checkmark
and
\input{tex/list-of-topics.tex}
.
Now let us stage and then commit this and look at the log:
commit c1689ea9c56603882e19a6ccf90625190db29ec7
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 12:31:34 2017 +0000
include list of topics in main doc
commit 2e94caa07c10beb47e767f7800bf9852b9931783
Merge: f252f5a b71bdc8
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 12:23:42 2017 +0000
Merge branch 'add-list-of-topics'
commit f252f5a6fcb79fb19c0bc8e7262a265da560b800
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 12:16:45 2017 +0000
change page width
commit b71bdc809c48e01fd34d60476d09baae34402b63
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 12:07:52 2017 +0000
write list of topics
Include checkmark for things we have currently covered.
commit b116460039bb9ed1f79a980a7a71b5b75f0618b4
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:40:47 2017 +0000
add .gitignore
commit 84c678c01ef099aeedc46c922feb38557cffd5c7
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 11:34:43 2017 +0000
add author name
commit 3c1e5adc13069023d25a76570f092135d0eeb68c
Author: Vince Knight <[email protected]>
Date: Sat Dec 9 10:59:04 2017 +0000
write blank checklist
This file is currently empty but will include the various aspects of
sustainable
software development learnt on the 2 day rsd workshop.
We can see the entire history of what we have written and could potentially go back to any stage. Furthermore, our directory is tidy without the need for various confusin suffixes at the end of each file.
Note Branching is "cheap" in terms of the amount of space it takes up. Thus it's always a good idea to branch when working on a project if you want to try out something.
Note It is possible to create something called a "merge conflict". This
occurs when mergeing changes that git
is unable to merge automatically. For
example changing the same line in a file. In this case, git will ask for you to
modify the file and commit it to complete the merge.
Summary
Here are the commands we have seen in this chapter:
git init
: Create a git repositorygit add <file>
: Start tracking<file>
and/or stage the changes to<file>
ready to be committed.git status
: See the current status of your git repository.git commit
: Create a snapshot of the current changes.git checkout <hash> <file>
: Set<file>
to be in the same state as it was in the commit corresponding to<hash>
. Note thatHEAD
is shorthand for the last commit.git branch
: List all branches.git branch <branch-name>
: Create a new branch called<branch-name>
.git checkout <branch-name>
: Move to<branch-name>
.git merge <branch-name>
: Merge<branch-name>
in to the current branch.git diff
: Show the differences between the current state of the repository and the last commit.
Tip
git
keeps track of things on a line by line basis. Thus changing any given character in any given line corresponds to changing the whole line. In the interest of keeping commits meaningful (thus helpful) it is useful to regularly line return in LaTeX documents otherwise changing various words in a paragraph can correspond to the same overall change.- There are a number of excellent helpful tools for interfacing with
git
. It is recommended to start by learning these basic commands directly so that you understand what is going on in the background before making use of the helpful tools.