Just Enough Git
The cln
app helps you manage your computational lab notebook with
git and
git-annex. While you still need
to know some things about git and git-annex, you don't need too much
-- as long as you're comfortable with some of the fundamentals, you
should be good.
Everyday Commands
Let's go through some everyday git commands.
/tmp/example$ git --version
git version 2.27.0
How to get help
To get a basic overview of command git commands:
$ git help
...prints out the help screen...
For more detailed help, use
$ git help -a
...lots and lots of commands...
To get help with a specific commands do something like this:
$ git help add
...shows help page for git-add
$ git help checkout
...shows help page for git-checkout
To get help with some of the most common git commands, run this:
$ git help everyday
...shows useful commands...
Basic git workflow
Let's run through a basic example git workflow. In this example, I
will make a new project (git repository), make a README.md
, make
some fake reads in a fasta file, do some silly "quality control".
Lines that start with $
are commands that I ran.
Initialize a new git repository.
$ git init
Initialized empty Git repository in /tmp/example/.git/
Make a README file.
$ printf "# My Cool Project
>
> This willl be a really fun project
> " > README.md
Add readme file contents to the "index".
$ git add README.md
Commit changes with message.
$ git commit -m "Initial commit"
[master (root-commit) d93d073] Initial commit
1 file changed, 3 insertions(+)
create mode 100644 README.md
Create and switch to a new branch called make-seqs.
$ git checkout -b make-seqs
Switched to a new branch 'make-seqs'
Make a fasta file.
$ printf ">seq1
> ACTG
> >seq2
> actg
> " > seqs.fa
Add the fasta file to the index.
$ git add seqs.fa
Commit the changes with a message.
$ git commit -m "Make a fake seqs file"
[make-seqs 0748c17] Make a fake seqs file
1 file changed, 4 insertions(+)
create mode 100644 seqs.fa
Switch back to the master branch.
$ git checkout master
Switched to branch 'master'
Merge the changes in the make-seqs branch into the master branch.
$ git merge make-seqs
Updating d93d073..0748c17
Fast-forward
seqs.fa | 4 ++++
1 file changed, 4 insertions(+)
create mode 100644 seqs.fa
Delete the make-seqs branch.
$ git branch -d make-seqs
Deleted branch make-seqs (was 0748c17).
Create and switch to a new branch called qc.
$ git checkout -b qc
Switched to a new branch 'qc'
Do some (fake) "quality control".
$ tr "a" "N" < seqs.fa > seqs.qc.fa
Add the changes/new file to the index.
$ git add seqs.qc.fa
Commit the changes with the given message.
$ git commit -m "Run read QC"
[qc 6a8f070] Run read QC
1 file changed, 4 insertions(+)
create mode 100644 seqs.qc.fa
Check the log to see what we have done.
$ git log --pretty=oneline
6a8f070741e083de38cee9498123c984ee33d514 (HEAD -> qc) Run read QC
0748c17a80587f1e9f5832a1966c6d1ddb9960de (master) Make a fake seqs file
d93d0735c7b05c5d69e80c5c68926e12be1744d8 Initial commit
Switch back to the master branch.
$ git checkout master
Switched to branch 'master'
Okay, that quality control wasn't very good, just delete the qc branch like it never happened.
$ git branch -D qc
Deleted branch qc (was 6a8f070).
Check the log...the "quality control" step is gone!
$ git log --pretty=oneline
0748c17a80587f1e9f5832a1966c6d1ddb9960de (HEAD -> master) Make a fake seqs file
d93d0735c7b05c5d69e80c5c68926e12be1744d8 Initial commit
See...just readme and sequences.
$ ls
README.md seqs.fa
It's the file we made.
$ cat seqs.fa
>seq1
ACTG
>seq2
actg
What is an "index"?
Here is what the git-add
help file says about the "index":
The "index" holds a snapshot of the content of the working tree, and it is this snapshot that is taken as the contents of the next commit. Thus after making any changes to the working tree, and before running the commit command, you must use the add command to add any new or modified files to the index.
git-add followed by more edits
If you run git add some_file.txt
, then change the contents of
some_file.txt
again before commiting the changes, those new
changes will not be reflected in the repository unless you run git
add some_file.txt
again before commiting. Here is an example of what
I mean:
Edit the seqs file with nano.
$ nano seqs.fa
Show the difference from last commit. I edited the first sequence
name. Lines that change have a -
indicated the old version and a
+
indicated the new change.
$ git diff seqs.fa
diff --git a/seqs.fa b/seqs.fa
index 60129d5..53d21e0 100644
--- a/seqs.fa
+++ b/seqs.fa
@@ -1,4 +1,4 @@
->seq1
+>seq1_best_sequence
ACTG
>seq2
actg
Add the changes to the index.
$ git add seqs.fa
Check the status. The changes are now staged for the commit.
$ git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: seqs.fa
Edit the file again.
$ nano seqs.fa
Show the differences again. I edited the second sequence name.
$ git diff seqs.fa
diff --git a/seqs.fa b/seqs.fa
index 53d21e0..c89e2ff 100644
--- a/seqs.fa
+++ b/seqs.fa
@@ -1,4 +1,4 @@
>seq1_best_sequence
ACTG
->seq2
+>seq2_2nd_best
actg
Commit the staged changes (still just the first name change!)
$ git commit -m "Edit the seqs file"
[master c5c5681] Edit the seqs file
1 file changed, 1 insertion(+), 1 deletion(-)
Check status. See how there are still changes not staged for commit?
That's because git add
stages changes at the time you ran git-add
.
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: seqs.fa
no changes added to commit (use "git add" and/or "git commit -a")
The changes are still there in the file though.
$ cat seqs.fa
>seq1_best_sequence
ACTG
>seq2_2nd_best
actg
Check the difference. See how the 2nd header is different?
$ git diff seqs.fa
diff --git a/seqs.fa b/seqs.fa
index 53d21e0..c89e2ff 100644
--- a/seqs.fa
+++ b/seqs.fa
@@ -1,4 +1,4 @@
>seq1_best_sequence
ACTG
->seq2
+>seq2_2nd_best
actg
Add the current changes to the index.
$ git add seqs.fa
Commit the changes.
$ git commit -m "Change 2nd header"
[master 9454d4e] Change 2nd header
1 file changed, 1 insertion(+), 1 deletion(-)
Check the status...the working tree is clean, all changes have been commited.
$ git status
On branch master
nothing to commit, working tree clean
Git Branches
Git has branches to help you keep work isolated. You generally have a
main branch called master
or main
. This is the "main" branch in
the sense that it is the main access point for your repository or
project.
You can do all your work in the main branch, but if you make mistakes and need to roll back any changes, it can be harder than if you do all of your work in a separate branch (i.e., a "working" branch).
Let's see a little example to show you what I mean.
Create a new git repo
Make a new git repository, add and commit an example text file.
$ mkdir example && cd example
$ git init
$ printf "hello, world\n" > hello.txt
$ git add . && git commit -m "Initial commit"
Check the current branch.
$ git branch
* master
See how we are on the master branch to start with? Now, let's create and switch to a new branch.
Create a new branch and do some work
$ git checkout -b edit-hello
Switched to a new branch 'edit-hello'
And now when we run git branch
you can see we are on a different
branch.
$ git branch
* edit-hello
master
Edit the file in the work branch
Now we will edit the hello.txt
file in the new edit-hello
branch
we just created.
$ sed -i 's/l/L/g' hello.txt
That's the sed
command. If you haven't seen it, it is a stream editor. All that
command does is change every l
to a L
in the file.
Now check the status of the repository.
$ git status
On branch edit-hello
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: hello.txt
no changes added to commit (use "git add" and/or "git commit -a")
It says that the hello.txt
file has changed. Let's check the
changes now.
$ git diff hello.txt
diff --git a/hello.txt b/hello.txt
index 4b5fa63..09d7e19 100644
--- a/hello.txt
+++ b/hello.txt
@@ -1 +1 @@
-hello, world
+heLLo, worLd
We can see that hello, world
is now heLLo, worLd
. That's what we
wanted so now we can commit the changes.
Commit the changes in the work branch
$ git add hello.txt
$ git commit -m "Swich lowercase 'l' to uppercase 'L'"
[edit-hello 27e542d] Swich lowercase 'l' to uppercase 'L'
1 file changed, 1 insertion(+), 1 deletion(-)
Just double check that the git repository is clean now.
$ git status
On branch edit-hello
nothing to commit, working tree clean
How has our repository changed
So now we've done some work on the edit-hello
branch, let's switch
back to the master
branch and see check the state of the repository.
$ git checkout master
Switched to branch 'master'
$ git status
On branch master
nothing to commit, working tree clean
$ cat hello.txt
hello, world
Whoa! The hello.txt
file is the same as it was before we switched
branches and did all those changes. Let's check the log.
git log --pretty=oneline
8a5b9042705aff008dc396b385a3f03200e79647 (HEAD -> master) Initial commit
See how that last commit isn't even listed? Let's go back to the
edit-hello
branch and check the log there.
$ git checkout edit-hello
Switched to branch 'edit-hello'
$ git log --pretty=oneline
27e542d2192c3b1807f14b2460118045d549d7f5 (HEAD -> edit-hello) Swich lowercase 'l' to uppercase 'L'
8a5b9042705aff008dc396b385a3f03200e79647 (master) Initial commit
Okay, both commits are still there. Notice how the commit at the top
says (HEAD -> edit-hello)
whereas the first commit says (master)
?
This is telling you that the commits are specific to certain branches.
Just to reinforce this, we will switch back to master
and look at
the logs again.
$ git checkout master
Switched to branch 'master'
$ git log --pretty=oneline
8a5b9042705aff008dc396b385a3f03200e79647 (HEAD -> master) Initial commit
Back to one commit with (HEAD -> master)
. And one more time, back
to edit-hello
branch.
$ git checkout edit-hello
Switched to branch 'edit-hello'
$ git log --pretty=oneline
27e542d2192c3b1807f14b2460118045d549d7f5 (HEAD -> edit-hello) Swich lowercase 'l' to uppercase 'L'
8a5b9042705aff008dc396b385a3f03200e79647 (master) Initial commit
What to do with the changes?
Now we have two options.
- I like the changes in the
edit-hello
branch and want to make them part of themaster
branch. - I don't like the changes in the
edit-hello
branch and want to forget about them.
Keep the changes in the work branch
For the first, I need to merge the changes in the edit-hello
branch
into the master
branch. Let's do that now.
$ git branch
* edit-hello
master
$ git checkout master
Switched to branch 'master'
$ git branch
edit-hello
* master
$ git merge edit-hello
Updating 8a5b904..27e542d
Fast-forward
hello.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
$ git status
On branch master
nothing to commit, working tree clean
$ git log --pretty=oneline
27e542d2192c3b1807f14b2460118045d549d7f5 (HEAD -> master, edit-hello) Swich lowercase 'l' to uppercase 'L'
8a5b9042705aff008dc396b385a3f03200e79647 Initial commit
$ cat hello.txt
heLLo, worLd
See how the second commit is now in the git log and see how the
hello.txt
file has the changes? Since the changes are fully merged,
we can safely delete the edit-hello
branch.
$ git branch -d edit-hello
Deleted branch edit-hello (was 27e542d).
$ git branch
* master
And now the edit-hello
branch is gone.
Get rid of the changes in the work branch
What about the second option. Let's say we don't really like the
changes of the edit-hello
branch, and we just want to get rid of
them. In that case we delete the branch without first merging the
changes.
(Note that the hash values have changed. That's because, for this tutorial, I just deleted the whole repository and started over rather than bother rolling back the changes like I would do in a real repository.)
First, make sure you are on the master
branch.
$ git checkout master
Already on 'master'
$ git branch
edit-hello
* master
Then delete the edit-hello
branch.
$ git branch -d edit-hello
error: The branch 'edit-hello' is not fully merged.
If you are sure you want to delete it, run 'git branch -D edit-hello'.
See the error you got? This is because there are changes present in
the edit-hello
working branch that are not currently part of the
master
branch. So if you were to delete the edit-hello
branch
now, you would lose those changes with no way to get them back. Since
this is actually what we want to do, we will follow the instructions
on the screen and use the -D
flag instead of -d
to force deletion
of the edit-hello
branch.
$ git branch -D edit-hello
Deleted branch edit-hello (was cc722a5).
$ git branch
* master
$ git log --pretty=oneline
f15b723b204deb59b1f23577439ec58d917b2269 (HEAD -> master) Initial commit
$ cat hello.txt
hello, world
As you see, we are back to the initial commit as though all of the
work on the edit-hello
branch never happened.
Wrap up
Git branches are a powerful technique for isolating work in such a way
that you can test things out without worrying about "messing up" your
repository. Then, when you've finished the task you're working on, if
it is good work, you can merge the changes into the master
branch,
or, if it was bad, you can just delete the working branch without
applying any of the changes to the master
branch.
Notes
- Notice how I named the work branch with a short description of what I was actually planning to do? This can be helpful if you have a lot of different branches.