User Tools

Site Tools


quickref:git

Quick Reference - git

Reference information for the Git DRCS. For more detailed information, see the following locations:

If you're not already moderately familiar with Git, strongly consider browsing the Git Structure section first as it can make Git operations easier to understand.

If you're just looking for some handy commands, check out the Recipes section.

Configuration

After downloading and installing the package, set up a name and email:

git config --global user.name "Andy Pearce"
git config --global user.email andy@andy-pearce.com

The configuration file to change can be specified by different options:

Option File
--system /etc/gitconfig
--global ~/.gitconfig
<none> repository/.git/config

Entries at each stage override those earlier in the list.

Optional Settings

There are a variety of other settings which you may wish to change:

Setting Default Meaning
branch.autosetupmerge true As the default true, the “upstream” of a local branch which forks off a remote branch is set to that branch. If set to always this behaviour is extended to include branches which fork off other local branches. If set to false the upstream is never automatically set.
color.ui false Set to true to enable coloured output, or always to enable even if piping output elsewhere.
To apply to only some commands, replace ui with branch, diff, interactive or status.
See the Coloured Output section for details on how to customise the colours used.
core.excludesfile Specify a global .gitignore file to be applied to all repositories (see .gitignore).
diff.tool varies Specify the tool to use for git difftool, either one of the pre-defined list of one defined with difftool "X".cmd.
merge.conflictstyle merge Set to diff3 to include common ancestor content in conflict markers.
merge.defaultToUpstream false If set to true, an unqualified git merge merges the current branch's “upstream” in - if false (the default) an unqualified merge is an error. Note that an unqualified git rebase does merge the upstream without changing any settings (i.e. the default behaviour is inconsistent between the two commands).
merge.tool varies Specify the tool to use for git mergetool, either one of the pre-defined list of one defined with mergetool "X".cmd.
push.default matching When doing a git push (see Pushing) without specifying branches on the command-line, by default Git attempts to push all local branches whose names match one on the remote - given that local branches can have a configured “upstream” on the remote, this behaviour may seem suboptimal. Changing this setting to simple makes Git instead only push the current branch, and only if the branch names match and the remote branch is set as the upstream of the local branch. This will become the default in Git 2.0. Can also be upstream for the same behaviour as simple but without the check for matching names and current to match only by name but only for the current branch. The value nothing does nothing without an explicit branch specified on the command-line.
receive.denyNonFastForwards false Set to true to disallow any non-fast-forward merges. This is often set on shared repositories to reduce the scope for losing changes.
receive.denyDeletes false Set to true to disallow deletion of any branch or tag, which may be set on shared repositories to reduce the scope of losing changes. If this is set, the only way to delete these items is to manually remove the appropriate files from the repository directory.

.gitignore

It is also possible to cause Git to ignore certain files in the working directory when performing operations such as git status by creating a file called .gitignore which lists the file patterns to be ignored. This can be created in any directory in the repository and its rules are applied at that level and to all subdirectories. It can also be committed to the repository just like any other file if desired, although it is the working directory version which is applied.

As well as the per-repository files, the core.excludesfile configurable can be used to set a per-user set of exclusions which apply to all repositories. For user-specific exclusions that should only apply to one repository (wbut which shouldn't be committed like the .gitignore file), the same format of exclusions can be specified in the .git/info/exclude file within the repository.

Each line in the file is interpreted as a pattern to match except for blank lines, which match nothing, and lines which start with a #, which are ignored. The following rules apply to patterns:

  • Patterns not containing a / are applied against the basename of the file and treated as shell globs.
  • Patterns containing / are applied to the full path relative to the .gitignore file and treated as shell globs.
  • Patterns with a trailing / are limited to only match a directory (but not a symlink to a directory).
  • Patterns with a leading / match the same directory as the .gitignore file.
  • Patterns prefixed with a ! are negated — i.e. they may re-introduce a previously-ignored file.

For the avoidance of doubt, the following special characters can be used in the glob patterns:

* Matches zero or more occurrence of any character except /
? Matches exactly one occurrence of any character except /
[] Matches exactly one occurrence of any of the characters listed, ranges of characters can be specified with a hyphen
[!] As [] but matches any character except those listed

To avoid committing the .gitignore file to the repository (for example, when using git-svn), simply add .gitignore to the file.

See Sample Files for an example of a .gitignore file.

Creating Repositories

To create an empty Git repository in the current directory:

git init

Any existing files will need to be separately added with git add and git commit — see the section on Committing and reverting.

To clone an existing repository:

git clone git://example.com/path/repo.git

The above command would create a directory repo to hold the new repository — an alternative name can be specified on the command-line after the URL.

The URL itself can be one of several forms:

  • /path/repo — a path to the repository on a locally mounted filesystem.
  • file://path/repo — as above but doesn't use hardlinks or similar tricks (may be slower).
  • ssh://user@host:/path/repo — uses SSH for transfers, common for read/write access.
  • user@host:/path/repo — SSH is the default if unspecified, user is also optional (defaults to current user).
  • git://host/path/repo — the special Git daemon protocol, similar to SSH but with no authentication — typically read access only.
  • http(s)://host/path/repo — as the Git protocol but uses HTTP/HTTPS tunnel — commonly read-only but can be made read/write with WebDAV (with appropriate authentication).

A git clone operation actually carries out several independent tasks:

  • Initialises an empty repository in the appropriate directory (as with git init).
  • Adds the source repository as a new remote under the name origin (see Remote Repositories).
  • Adds remote tracking branches for each branch in the source repository. (see Remote Repositories).
  • Checks out the local tracking branch corresponding to the active branch on the source repository (see Branching and Stashing).

Both of the above commands create a repository with a working directory for editing files, and this is the correct option for a repository where commits will be performed. To create a repository for hosting purposes, intended for accessing remotely by pushing and pulling changes, a working directory isn't required. In this case the --bare option can be used to create a bare repository in the current directory. For example:

mkdir repository.git
cd new-repository.git
git init --bare

This option also applies to the git clone command.

Committing and Reverting

Prior to committing files to the repository proper, Git requires them to be added to the index, also known as the staging area. From there they can be committed on to a branch in the repository proper.

Staging Files

Adding a file to the index is done with the following command:

git add <path> [...]

… where <path> may be a single file, a glob (escaped from the shell) or a directory to recursively specify all files within it.

The same command is used to identify modified files as well as those which don't yet exist in the repository. In both cases, the file content at the time the command is run is copied into the index, so later changes to the same file will not be reflected in the index unless a further git add is run.

To stage many changes, the add command can be run interactively:

git add -i

This will launch a command prompt which initially shows output similar to this:

           staged     unstaged path
  1:    unchanged        +1/-0 foo/bar/baz.h
  2:    unchanged        +4/-3 foo/bar/baz.cpp
  3:    unchanged        +6/-4 foo/bozzle/README.txt
  4:    unchanged        +9/-1 foo/bozzle/xyz.h
  5:    unchanged      +45/-42 foo/bozzle/xyz.cpp
  6:    unchanged      +97/-94 foo/bozzle/xyzzy.h

*** Commands ***
  1: status	  2: update	  3: revert	  4: add untracked
  5: patch	  6: diff	  7: quit	  8: help
What now>

This sample output shows six locally modified files – for example, README.txt has six lines added and four lines removed. None of the files have any staged changes however.

Entering 2 enters the update mode, and then one or more files can be selected to be staged. Files can be specified in at least the following ways:

3 Add a single file specified by number.
foo/bozzle/R Add a single file specified by unique prefix.
1-3 Add a range of files specified by number.
-2 Remove a file or range from the selected set.
1,-2,3-5 Process multiple items one at a time, each of which may be any of the above.

A carriage return on its own leaves this mode and goes back to the main prompt — choose option 1 to re-show the status at this point to check which files have been staged:

           staged     unstaged path
  1:    +1/-0          nothing foo/bar/baz.h
  2:    +4/-3          nothing foo/bar/baz.cpp
  3:    unchanged        +6/-4 foo/bozzle/README.txt
  4:    unchanged        +9/-1 foo/bozzle/xyz.h
  5:    +45/-42        nothing foo/bozzle/xyz.cpp
  6:    unchanged      +97/-94 foo/bozzle/xyzzy.h

*** Commands ***
  1: status	  2: update	  3: revert	  4: add untracked
  5: patch	  6: diff	  7: quit	  8: help
What now>

Now we can see that changes have moved from the working directory into the index for those files that we've staged. If we made further modifications to the same files before committing then we'd see change summaries in both columns, since the staged files would differ from the current HEAD and the working directory copies would also differ from the staged ones.

The other options perform the following functions:

3 revert Unstages files in the same way as staging them.
4 add untracked Adds files which aren't yet in the repository by path name. Use caution if you have many untracked files as this option searches for untracked files recursively through all extant directories.
5 patch Interactively stages selected portions of a file. This is done on a chunk-by-chunk basis within the diffs for that file, where chunks can be split as necessary to encapsulate the required changes (this can be done directly using git add --patch).
6 diff Performs as git diff --cached — see Showing Changes for details.
7 quit Exit this session (changes will already have been saved). CTRL-D also works.
8 help Show a brief command summary.

After adding files, the git status command can be used to see which files have been modified and which files have been staged for commit. This shows:

  • Changes which would be committed by a git commit command (see later in this section).
  • Repository files which have been modified but not yet staged in the working directory.
  • Files in the working directory which aren't yet in the repository.

Here is some sample output from it:

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	new file:   staged.txt
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   modified.txt
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	untracked.txt

See the Showing Changes section for ways of getting more detailed information about changes.

Reverting Staged Files

As the git status output helpfully mentions, to “unstage” a file (i.e. reverse the operation of git add), the following command can be used:

git reset HEAD <path>

The operation of the reset command is subtly different if a path is not specified, so make sure you always pass the path when using it to unstage files. You can find more details about git reset in the Reverting Commits section.

To revert a working file back to the state of the repository, instead use the following command:

git checkout -- <path>

This is command has the potential to destroy changes (indeed, that's the point of it) so use with caution. This is one of the few times that Git won't warn you before overwriting a local file.

Committing Changes

The index can be freely manipulated with as many git add and git reset commands as necessary. To commit these staged changes to the repository use the following command:

git commit

Make sure you've set up your email address and name (see Configuration) before your first commit.

This will invoke your editor to enter a suitable commit comment — alternatively the -m option can be used to specify one on the command-line. On the subject of git commit messages, there is a pretty strong convention on how they're formatted which tends to work well with the output formats of the various tools:

Short summary, capitalised and under 50 chars.

A blank line under the summary, followed by one or more parapgraphs of
more detailed comments about the commit. The format of these is
relatively free-form, but should be wrapped at around 72 columns.

Once a commit has been made, it's still possible to revert it with Git but the consequences potentially affect other people. The process for doing this is covered in the Changing History section.

The git commit command also has a shortcut to avoid the need for a git add:

git commit -a

This does an implicit git add on any locally modified tracked file. It's important to note that only tracked files will be caught — that is, those files which have previously been added to the repository. Any files which have been created but never added with git add will be ignored for the purposes of git commit -a.

Cleaning Working Directory

It is occasionally useful to remove files from the working directory which don't correspond to files in the repository. For example, to remove the by-products of a build process. This can be achieved with the git clean command. Typically you should run this with -n first to see what action would be taken:

git clean -n

Then the actual operation can be effected with the -f flag:

git clean -f

By default, git clean refuses to take any action unless either -n or -f are specified — this behaviour can be changed with the clean.requireForce configurable.

Some useful options modify the behaviour of this command:

-d Remove directories as well as files (directories are skipped by default).
-x Also remove files listed in .gitignore (which are skipped by default).
-X Only remove files listed in .gitignore (useful for build products).

Branching and Stashing

As discussed in the Git Structure section a branch in Git is simply a pointer to a commit, which itself contains a pointer to a snapshot of the entire repository as well as a link to a parent commit. A newly-created repository has only the master branch, whereas a cloned repository will contain all of the branches of the remote repository.

Branching

To create a new branch locally:

git branch mynewbranch

This will create a new branch pointing at the same commit as the current HEAD, which is reference to the current branch. At this point the working directory still reflects the original branch — to switch to the branch just created:

git checkout mynewbranch

This shouldn't change any files in this case because the branch hasn't yet had any commits on it, but it will move the HEAD pointer so any new commits will be made on mynewbranch. Creating a new branch and then immediately checking it out is such a common operation that the checkout command has a shortcut for it:

git checkout -b mynewbranch

The process of adding and committing files is exactly the same on a branch and these operations will always use the current branch.

To list the current local branches:

git branch

The -r and -a options can be used to also include remote branches. The output of the git branch command is a simple list of branch names with an * indicating the current branch. The -v option includes the most recent commit on each branch:

  master           10283c1 Fixed bug 1234.
* mynewbranch      effc2bb Experiments to boost speed.
  someotherbranch  17e77a6 Support IPv6.
  yetanotherbranch 87dfa34 Translate comments to Klingon.

Branch names can include a slash (/) as a separator to group related branches under a namespace. For example, features/ipv6-support. In principle any character can be so used, but slashes are conventional. Since branch names are stored as files under the .git directory, slashes will cause a subdirectory to be created. This helps keep metadata organised, but means that it's possible to have, say, a features branch in parallel with the features/ipv6-support branch, because a file and a directory cannot have the same name in the same parent directory.

To illustrate how the branching is occurring within the repository, imagine a small repository with a single master branch and three commits:

At this point a new branch is created:

git checkout -b feature-x

At this point let us imagine that a couple of new commits are done on this branch:

...
git commit

If a merge back into master was performed at this point, Git would intelligently perform a fast forward, since all it requires is moving the master branch to a later commit. Unlike a standard merge this will not result in the creation of a new commit (see Merging for details). However, let's make the situation a little more complicated by assuming that the user switches back to master and performs some additional commits:

git checkout master
...
git commit

The process of merging feature-x back into master is discussed in the section on Merging and Rebasing.

Stashing

Since checking out a branch typically changes files in the repository, Git typically won't let you do this if you have any uncommitted changes in your repository. Of course, it's not always convenient to apply changes before switching branches and so there is an easy way to save these changes away:

git stash [save <message>]

This command takes modified files in both the working directory and the index and pushes them on to a stack of stashed files. It then reverts all these files back to their HEAD state, thus leaving the working directory clean.

If you're working on a local feature branch, consider simply committing instead of stashing. You can always merge commits and change commit messages via an interactive rebase later. This keeps your changes attached to the appropriate branch, whereas stashed changes are “floating” and may be applied to any other branch.

The example above shows that git stash is in fact short for git stash save, and using this longer form allows an optional message to be specified describing the stashed changes (similar in principle to a commit message).

By default only tracked files within the repository are stashed away, leaving untracked files unchanged in the working directory. The -u option (or --include-untracked) causes untracked files to also be saved away and then deleted, as with git clean. The -a option (or --all) is similar but also includes ignored files as well. As well as these inclusive options, the --keep-index option can be used to exclude the index, leaving any files for whom commit is pending intact and only stashing the rest.

To selectively stash a subset of the local changes, use the --patch option to save, which will enter an interactive session similar to interactive adding (see Staging Files) where the hunks to save can be selected.

To see the stack of stashed changes:

git stash list

To re-apply a stashed change:

git stash apply [--index] [<stash>]

The --index option will additionally re-instate staged files in the index, although be aware that this can fail if there are already staged files. Without this option manual git add commands will be required to add them back into the index. By default the most recent stash is used, but earlier ones can be specified. As shown by the git stash list command output, stashes are referred to by the syntax stash@{x} where x is 0 for the most recent stash, 1 for the one previous to it and so on. This will become more clear in the later section on Specifying Revisions.

Stashes can be applied back to the same branch from whence they came or can equally be applied to a different branch — they are essentially just a diff which can be applied back as a patch to any state of the working directory. Other useful commands for dealing with stashes:

git stash drop [<stash>]
git stash pop [<stash>]
git stash show [-p] [<stash>]

The drop command can be used to remove old stashes that are no longer required and the pop command is a shortcut for performing a apply followed by an immediate drop — if the stash doesn't apply cleaning, the drop is skipped and must be done manually once the conflicts are resolved. The show command displays a summary of the stash and, with the -p option, shows the diff as well. Strictly speaking this command accepts any format that git diff does — see the Showing Changes section for details.

The git show command can also be used with the git apply command to “unapply” a stash which was applied to the local tree in error:

git stash show -p <stash> | git apply -R [--index]

This simply displays the diff of the stash in standard unified diff format and then pipes it into the git apply command to be applied to the working directory (and index if --index is specified). The -R option causes the patch to be reversed, which should remove its effects from the listed files. If there have been other intervening changes in the meantime, of course, then conflicts could occur so this command is probably best saved for cases immediately after a stash was accidentally applied.

Merging and Rebasing

Once work has been committed on a branch there will come a point where it's time to move that work on to another one — typically when a fix or feature is complete, the work will need to merge back into master for example. There are two ways to approach this:

  • Merge: This typically creates a new commit which merges changes in two branches, preserving the history of both.
  • Rebase: This takes the changes on one branch and applies it as a patch to another, preserving a linear history on the target branch.

To illustrate both of these methods, the example from the Branching section will be continued:

Now the time comes to move the work on the new feature back into master. The following two sections examine the two basic approaches to achieving this.

Merging

This section outlines a merge process where two branches both have changes which must be combined. If one branch has had no changes since the branch point, Git performs a fast forward which simply involves moving the branch to point to a different commit. This doesn't involve the creation of a new merge commit and is hence an exception to the discussion below.

To merge the changes from feature-x into master, execute the following:

git checkout master
git merge feature-x

While it is possible to run git merge when there are uncommitted changes in the working directory, it is strongly discouraged as it can lead to changes which are hard to back out if conflicts occur. If necessary, use git stash to save changes away prior to merging.

This command first identifies commit3 as the most appropriate common ancestor for the merge, and then attempts to merge the changes between this, commit5 and commit7. It also decides which merge strategy to use, either based on the command-line option or some builtin rules — typically Git will select the best strategy itself.

If there are no conflicts during the merge process, the changes are immediately committed to the repository and the merge process is complete. If there are conflicts, however, Git will display a message to that effect and modify the affected files to include standard conflict-resolution markers:

<<<<<<< HEAD:main.c
      int i, j;
      for (i = 0; i <= entries; i++) {
        for (j = 0; j <= entry[i].indicies; j++) {
          log_debug("checking entry %d/%d", i, j);
          checkEntryItem(entry[i].index[j]);
=======
      for (size_t i = 0; i <= entries; ++i) {
        for (size_t j = 0; j <= entry[i].indicies; ++j) {
          log_debug("checking entry %zu/%zu", i, j);
          checkEntryIndexItem(entry, i, j)
>>>>>>> feature-x:main.c

In this case the top block indicates the state of the file as it is in master (i.e. the current HEAD) and the bottom block indicates the same section of the file in feature-x.

This default format doesn't display the common ancestor's content. To enable this, set the merge.conflictstyle configurable to diff3.

A git status command will list the files which haven't yet been merged. These conflicts can be resolved by manually editing files or Git can invoke an external merge tool:

git mergetool

This will prompt for which external tool to launch to resolve the conflicts — the default tool can be changed with the merge.tool configuration setting.

To facilitate the conflict resolution process a new pointer MERGE_HEAD is created to point to the branch being merged — the existing HEAD pointer is left on the target branch. The index is updated to store up to three versions of each conflicting file:

  1. The common ancestor of the two merges.
  2. The file as it is on HEAD.
  3. The file as it is on MERGE_HEAD.

These three versions can be examined during the merge process by using:

git show :1:<filename>

… where :1: shows the common ancestor, :2: shows the HEAD version and :3: shows the MERGE_HEAD version.

As each conflicted file is resolved add it back into the index with git add as normal. Once all the conflicts have been resolved then a git commit will finalise the operation. Alternatively, to clean up the index and working directory back to HEAD execute the following:

git merge --abort

Once the merge change has been committed, the tree looks like this:

At this point the feature-x branch can be deleted:

git branch -d feature-x

Rebasing

To rebase the changes from feature-x on to master, execute the following:

git checkout feature-x
git rebase master

This will take the changes from the branch and re-apply them to master on top of the changes already there. The feature-x branch is updated to point to these rebased changes:

This has performed the following operations:

  • Identify the common ancestor of the two branches (commit3 in this example).
  • Save away the diffs between that commit and the branch into temporary files.
  • Restore the state of the repository to target branch (master in this example).
  • Apply each diff from the branch in turn on top of the target branch.

As with merging, if there are no conflicts during this process then the new changes are immediately committed to the repository and the branch is updated as per the diagram above. However, as you can see from the diagram it's important to note that the master branch hasn't been updated — all that's happened is that the feature-x branch has had changes from master merged in. If further changes were to be committed to master at this point, the branches would fork again. Due to the rebase, however, an immediate git merge will perform a fast forward of the branch, so not require an additional merge commit.

When doing the merge, it can be helpful to use the --ff-only option to make sure that the merge won't proceed unless it's a true fast forward operation.

One important thing to note about rebasing is that it will discard changes where the diff appears identical to one already on the target branch. This means that the commit meta-information (author, message, etc.) of the commit on the source branch will be lost.

It's also possible to rebase on to a different branch than the parent with the --onto option. Consider this repository where branch feature-x has also had a branch created from it called feature-x-y:

Executing the following command:

git rebase --onto master feature-x feature-x-y

… will result in the following repository structure:

You'll note from the above command that providing the source branch in the rebase command will automatically check that branch out prior to performing the rebase. This is a useful shortcut to performing the git checkout manually, but it will still abort in the same circumstances (e.g. uncommitted changes).

If conflicts occur the resolution process is similar to that of git merge, but proceeds only a single commit at a time. The output of the git rebase command will indicate which files have issues and the changes must be resolved before the process can continue. As each file is resolved it's updated in the index with git add, as with the merge process, but then instead of a git commit the rebase process continues with:

git rebase --continue

To instead abort the rebase process:

git rebase --abort

To skip an individual commit and continue with the next one:

git rebase --skip

As well as the standard rebasing above, it's possible to run the same command interactively. This allows you to re-order as well as merge commits. This process is covered in the Changing History section.

Merge Process Quick Reference

Cherry Picking

To merge an isolated change from a branch in Git is known as cherry-picking. This can be thought of as a rebase for a single change to move it from one branch to another, followed by an automatic fast-forward merge. To do this, run the following command on the specific commit:

git cherry-pick <commit> [...]

This will pull the specified change or changes from their own branch into the current branch. Conflicts are handled as with a standard rebase operation. If necessary this step can be repeated for other changes and then an interactive rebase performed on a parent commit to squash them into a single commit (or as required).

The change is essentially “copied” from the source branch into the destination, it still remains on both branches after the operation. If a later git rebase is performed to move the remaining changes on to the destination branch, the duplicate change will be included in the set of changes to rebase but typically be detected and automatically ignored. Effectively this means that the commits may become reordered on the destination branch.

For example, consider the following branch scenario:

Performing the following command:

git cherry-pick <SHA1 of commit5>

… will result in:

Note that unlike a standard rebase, the destination branch (master in this case) has also been updated as with a git merge. Now suppose a standard rebase is performed:

git rebase master feature-x

… then the repository would become:

This demonstrates that commit5 and commit4 have effectively been reordered.

Specifying Revisions

There are various places in Git where either a specific revision or a range of revisions must be specified. There are a variety of ways this can be done.

Single Revisions

The following methods all refer to a specific commit object:

Method Example Explanation
SHA1 734713bc047d87bf7eac9674765ae793478c50d3 The canonical way to refer to a commit is its full SHA1 hash. This is guaranteed to be unique within the repository (if two commits happen to hash to the same SHA1 then this will cause problems — this is quite unlikely, however).
Truncated SHA1 734712b Since full SHA1s tend to be a little cumbersome, Git allows them to be truncated to the shortest prefix which makes them unique.
Branch name HEAD A branch name, or a pointer to a branch like HEAD, is entirely equivalent to specifying the commit to which the branch points.
Reflog by ID HEAD@{3} A reference to a recently-used HEAD or branch position — see the section Reflog below.
Reflog by time master@{1.week.ago} For anything still in the reflog, the state of a branch or HEAD can be specified by time.
Parent HEAD^ Appending ^ to the end of a reference indicates the parent of that reference — i.e. the previous commit. This can be repeated multiple times, so ^^ indicates a grandparent, and so on.
N generation parent HEAD~3 A shorthand for multiple ^ symbols — for example, HEAD~3 is equivalent to HEAD^^^.
Nth parent HEAD^2 For commits which have multiple parents (i.e. merge commits), appending a number to the ^ moves to the other parents of the commit. For a commit which merges master with mybranch, for example, HEAD^ refers to the final commit on master before the merge and HEAD^2 refers to the final commit on mybranch before the merge.

These references can be combined arbitrarily. For example, d4f213^2~3 means “the great-grandparent of the second parent of the commit whose prefix is d4f213”. The diagram below, taken from the gitrevisions man page, demonstrates some of the parent reference possibilities:

The following references are equivalent:

A A^0
B A^ A^1 A~1
C A^2
D A^^ A^1^1 A~2
E B^2 A^^2
F B^3 A^^3
G A^^^ A^1^1^1 A~3
H D^2 B^^2 A^^^2 A~2^2
I F^ B^3^ A^^3^
J F^2 B^3^2 A^^3^2

Reflog

The reflog is a list of references to recent positions of different branches. Every time a branch tip is moved on to a different commit, a new entry is created in the reflog for it. These entries persist for a configurable amount of time (specified by the gc.reflogExpire setting) and are then removed as part of the standard Git garbage collect cycle, which also cleans up unreachable commits.

To list all the current entries in the reflog:

git reflog [show]

The default action is show which yields output such as this:

734713b... HEAD@{0}: commit: fixed refs handling, added gc auto, updated
d921970... HEAD@{1}: merge phedders/rdocs: Merge made by recursive.
1c002dd... HEAD@{2}: commit: added some blame and merge stuff
1c36188... HEAD@{3}: rebase -i (squash): updating HEAD
95df984... HEAD@{4}: commit: # This is a combination of two commits.
1c36188... HEAD@{5}: rebase -i (squash): updating HEAD
7e05da5... HEAD@{6}: rebase -i (pick): updating HEAD

The entry in the second column is a unique reference name that can be used to refer to that commit. As mentioned in the Single Revisions section, it's also possible to replace the number with a time reference such as HEAD@{yesterday}, HEAD@{14:25} or HEAD@{2.hours.30.minutes.ago} (note the use of dots instead of spaces). This will refer to the most recent commit at or prior to that time.

The reflog entries can be manually expired with git reflog expire and a specific entry deleted with git reflog delete, but these commands should rarely be required.

Revision Ranges

Commands such as git log operate on a set of commits, not just a single one. To that end, the syntax for single revisions is augmented in various ways to allow the specification of a range of commits:

The syntax used by git diff appears superficially similar but is not the same!

Method Example Explanation
Single revision r1 Specifies the commit r1 as well as all commits which can be reached by following parent links of r1.
Multiple revisions r1 r2 The union of all commits reachable from either r1 or r2 or both.
Negated revisions r1 ^r2 Prefixing a revision with ^ excludes revisions reachable from that commit from the range, so the example here is all revisions reachable from r1 except those reachable from r2.
Revisions range r1..r2 A convenient shortcut entirely equivalent to ^r1 r2.
Symmetric difference r1...r2 Three dots gives the set of commits reachable from either of two others, but not those reachable from both.
Parents only r1^@ Commits reachable from r1 but not r1 itself.
Exclude parents r1^! The commit r1 but no other commits reachable from it.

Using the example from earlier:

… here are some examples of the result of various range specifications:

Range Matching commits
D G H D
D F G H I J D F
G..D H D
^D B E I J F B
B...C G H D E B C
^D B C E I J F B C
C^@ I J F
F^! D G H D F

Showing Changes

Git offers three main commands to view the status and history of files:

  • git status shows a summary of the state of the working directory and index.
  • git log shows information about historical commits.
  • git diff shows the differences between files in the repository, index and working directory.

git status

The git status command has already been introduced in the Staging Files section. This command is gives a useful overview of the state of both the working directory and the index. One point that's worth noting is the behaviour with regard to untracked files (i.e. those in neither index nor repository) can be customised with the -u option:

-uno Omit untracked files from the results entirely.
-unormal Show top-level untracked files and directories (the default).
-uall Recurse into directories when showing untracked files.

git log

The git log command traverses the history of the repository and shows details of a specified subset of commits in a configurable format. By default it shows commit comments, authors and dates (but not diffs) for all commits starting at the current branch head and going back to the initial commit.

To show the commits in a specified range then one of the following forms can be used, where <since> and <until> may be commit hashes, branch names or any of the other means of referring to commits mentioned in Specifying Revisions:

git log <since>..<until> Changes committed after <since> and up to and including <until>.
git log <until> Change <until> and all parent (i.e. previous) commits.
git log <since>.. Equivalent to git log <since>..HEAD.
git log ..<until> Equivalent to git log HEAD..<until>.
git log Equivalent to git log HEAD.

This is only a subset of the available range specifications — see Revision Ranges for more options.

To only show commits which apply to a specified file or path, this can also be added on to the command line. Where there is any risk of being confused with a commit specification, it will need to be prefixed by -- and a space (Git should terminate with an appropriate error message if there is a risk of ambiguity):

git log mybranch -- path/within/repo

Any or all of the following options can be used to further filter the set of changes:

-<N> Show only the first <N> commits which would have otherwise been listed.
--after=<date> Show only commits after the specified date, --since is an alias.
The date format is flexible and can include relative dates.
--before Show only commits before the specified date, --until is an alias.
--author=<regexp> Filter author field by the specified regular expression.
--committer=<regexp> As --author but filters the committer field.
--grep=<regexp> Filter commit messages by the specified regular expression.
--all-match Show commits matching all of --author, --committer and --grep options.
(Default is to show commits which match any).
-i Perform case-insensitive pattern matching against <regexp>.
-E Interpret <regexp> as an extended regular expression.
-F Interpret <regexp> as a fixed string as opposed to a regular expression.

There are several options to change the output format:

--graph Show commits in an ASCII art graph indicating branch and merge history.
--all Show history of all branches, not just current one — often used with --graph.
--decorate Annotate changes with any refs (branches, HEAD, etc.) that point to them.
--name-status Include the list of changed files and whether they were modified, added or deleted.
--name-only shows the same but omitting the operation.
The file list can be filtered with --diff-filter.
-p Append the diffs introduced by the change to the entry.
--pretty=fmt Select output format where fmt is oneline, short, medium (default), full or fuller.
There are also additional options such as format to specify a printf()-like string.
See the git-log man page for more details.
--relative-date Show dates relative to the current point in time.
--stat Display a diffstat summary of each change.
Use --shortstat for just the one line summary.

There are a multitude of other options for both display and filtering, and this section has aimed to demonstrate only the most useful. See the git-log man page for more details.

git diff

The git diff command is used to show differences between files in various locations. This is similar to git log in some ways, although git diff compares the differences in the endpoints rather than examining the full commit history. As a result, while the syntax appears superficially similar the ranges used by git log, the interpretations differ.

The possible invocations are:

git diff Show the differences between the current working directory and the index.
git diff <commit> As git diff, but compare the working directory to the named commit instead of the index
(e.g. git diff HEAD to compare to most recent commit on the current branch).
git diff --cached [<commit>] Show the differences between the index and the current HEAD.
Optionally, a different commit than than the HEAD can be specified.
git diff <from-commit> <to-commit>
git diff <from-commit>..<to-commit>
Show differences between two arbitrary commits (the two forms are equivalent).
Omitting either commit in the latter form causes Git to assume HEAD for it.
git diff <from-commit>...<to-commit> Show differences between the common ancestor of both commits and <to-commit>.
Omitting either commit causes Git to assume HEAD.

As with git log, it's possible to supply one or more paths to filter the files diffed:

git diff -- path/in/repo

By default the git diff command shows all differences and produces output suitable for use as a patch. A variety of other options can be used to filter the diffs and alter the output format:

-b Ignore changes in amount of whitespace, use -w to ignore all whitespace.
--check Instead of displaying the diff, just report any whitespace errors introduced.
--dirstat Show summary by directory of how the changes are distributed.
-G <regexp> Show only hunks whose changed lines match specified regular expression.
--name-status Include the list of changed files and whether they were modified, added or deleted.
-–name-only shows the same but omitting the operation.
The file list can be filtered with --diff-filter.
--raw Instead of producing a patch, produce the Raw Format output, one line per file:
:<src mode> <dst mode> <src sha1> <dst sha1> <status> <src path> <dst path>
Status: M: in-place edit, C: copy edit, R: rename edit, A: create, D: delete, U: unmerged
-S <needle> Shows only hunks which have either added or removed <needle>.
Can make this a regular expression by also passing --pickaxe-regex.
--stat Generate a diffstat summary for each file.
Use --shortstat for just the one line summary.

As well as the standard git diff command, it's also possible to invoke an external diffing application in a similar manner to git mergetool:

git difftool [-y] [--dir-diff]

By default this will iterate over each modified file and prompt the user whether they wish to launch the tool on that file — the prompt can be suppressed with the -y option. Alternatively, for diffing tools which support recursive diffs of directories the --dir-diff option can be used to allow this.

The --dir--diff option was added in Git version 1.7.11 so may not be available on older platforms.

Changing History

This section concerns making changes to changes which have already been committed to the repository. There are several mechanisms to do this which are covered in the following sections, although bear in mind that the commands presented here are intended as a useful subset of the possibilities that Git provides — should it be required, it's possible to get fairly raw access to the underlying repository directly.

Some of these commands can obliterate committed changes and/or revert files in your working directory — you are advised to use them with caution.

Reverting Commits

The git reset command can be used to move the state of a branch back to a previous commit, effectively obliterating the commits made in the meantime. This command has already been seen in the Staging Files section to revert files out of the index, but it can also be used to change the repository directly.

The reset command performs three separate actions in order, and different command-line options can be supplied to stop at any point. The three stages, and the options to stop at that action, are shown below:

--soft Move whatever the current HEAD points at back to the specified revision.
This doesn't just move HEAD, it actually changes the underlying branch.
The index and working directory are left alone.
--mixed Re-initialise the index to the state of the repository at the new HEAD location.
The working directory is left alone.
--hard Reset the working directory to match the new state of the index.

After performing a git reset --soft, you could repeat a git commit and easily re-instate the changes since the index is still in place.

After performing a git reset --mixed (which is the default if the option is ommitted), you could repeat appropriate git add and git commit operations to re-instate the changes.

After performing a git reset --hard, all local changes are lost and the tree has effectively been fully reverted to a previous state.

The git reset --hard command is actually quite similar to git checkout in some ways, but the latter only moves HEAD as opposed to moving the branch to which HEAD points. The latter is also working directory safe, unlike git reset --hard.

The git reset command can also accept one or paths to limit the operation of the command to those paths, and doing so subtly changes the behaviour.

git reset -- path/in/repo

In this case, the first step of moving the HEAD branch is skipped, because that's not a meaningful thing to do on a subset of the repository, so it is invalid to supply paths with the --soft option. The --hard option is also disallowed because this is exactly the same function as git checkout with paths.

Metadata Changes

To change the most recent commit:

git commit --amend

This will update the most recent commit with your currently staged changes (those tracked with git add and git rm) and pop up the editor to change the commit comment. To change only the comment and leave the set of files alone, use:

git commit --amend -o

Also the --reset-author option can be used to alter the author of the commit in cases where this was incorrect.

This invocation is effectively a convenience as the same effect could be achieved by reverting the previous commit with git reset and then repeating it.

Interactive Rebase

To change more than a single commit an interactive rebase is required. In the case of changing metadata, this works by picking a parent commit and rebasing all subsequent changes on top of it. In this example we'll rewrite some of the previous three commits, so the parent commit is HEAD~3 (i.e. four commits ago) — see Specifying Revisions for details about this syntax.

Of course, a different branch can be supplied just like any other rebase operation. The key point here is that the rebase is performed interactively, allowing you to make changes to the commits as they're rebased.

To start the rebase operation, invoke the command with the -i option:

git rebase -i HEAD~3

Git should pop up the editor with the rebase script ready for changes. This is a specification of the changes that the rebase operation is about to apply in order (note: this is opposite to the reverse-chronological order that git log uses). Here is an example of a rebase script:

pick 57a1647 Added the FooBar module.
pick b1d8025 Fixed some bugs in the FooBar module.
pick f1be38b Optimised the main FooBar loop.

# Rebase e514ada..f1be38b onto e514ada
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

The pick command on each line means that the commit will be applied unchanged by the rebase process. This command can be changed on any or all lines to alter the behaviour of the script. The options for replacing it are:

reword As the rebase script runs, Git will pop up the editor for any marked reword and allow your to edit the commit message.
edit More flexible than reword, this will cause Git to drop into the shell at the marked commit and allow you to run an arbitrary git commit --amend or git reset operation (since the change in question is the most recent one at this point during the rebase). This allows the set of files committed to be changed, the changes to be altered or commits to be split (by a git reset followed by new commits).
squash Any changes marked squash are merged with the one(s) above to leave only a single commit in the repository history. Git will pop up an editor showing all the separate commit messages to allow you to merge them however you wish.
fixup As squash but omits the commit message from the merged version suggested.

Once this file is saved, Git proceeds with the rebase as normal. Conflicts are handled as per a standard rebase and the edit action will also require the use of git rebase --continue to carry on the rebase operation.

Squash Merges

Similar in some ways to the squash option of interactive rebasing is the --squash option to git merge. This allows multiple commits to be merged into a single commit on a different branch. The primary difference is that an interactive rebase modifies a source branch such that the destination could be fast-forwarded to it, whereas a squashing merge will create an entirely new commit on the destination branch which is the combination of all the source commits, leaving the individual commits intact on the source branch.

The syntax is the same as for a standard git merge with the addition of the --squash parameter. The result is that the index (only) is updated with the combination of all the source commits, so it differs from a standard merge in that this must then be committed to the repository with git commit. In this way merges from multiple branches may be squashed together by repeated executions of git merge --squash prior to the git commit. Any conflicts will be flagged and must be resolved as with a standard merge before continuing.

Once the commit is done, the history of each source branch is left untouched and a new commit will have been created on the destination branch which is the combination of the selected commits. The source and destination branches will have effectively diverged. The situation will be something like the following example, which assumes that other commits have been made to master before doing the merge.

As can be seen, the source branch is intact and could be left in existence as required, but unlike a rebase its commits haven't become part of the history of the destination branch so future merges may well produce conflicts. It's also worth noting that, unlike a standard merge, the final commit(s) on the branch(es) do not become parents of the mainline commit. The merged commit could have equally been created by manually applying the diffs of each source commit and an entirely new commit created.

Remote Repositories

Git is designed to work in a distributed fashion, where each Git repository is equal to all others — any decisions about which repository is canonical are a matter of user policy. As a result, Git's interactions with remote repositories are bidirectional in nature and changes can be both pushed and pulled.

One thing to bear in mind when creating a repository for remote use is that it's strongly recommended that a repository that will be accessed by multiple users be created as bare (i.e. with no working directory). See Creating Repositories for instructions on how to do this.

Remotes

Remote repositories are tracked by adding them as remotes to your repository. Each one can be assigned a short name, rather like branches. If you create a repository via git clone then it will already have the origin remote set up. To add new remotes:

git remote add <name> <url>

To list all current remotes, simply run:

git remote [-v]

With the -v option the URL of each remote is also shown — in this case, some entries may be split into two lines, in cases where there may be separate push and pull URLs.

Details about a specific remote can be listed with:

git remote show <name>

And remotes can also be renamed and removed:

git remote rename <old> <new>
git remote rm <name>

Remote Branches

Branches in remote repositories can be referred to via the syntax remote/branch. For example, to list changes which have been committed locally but not yet pushed up to the master branch on remote origin:

git log origin/master..master

These branches can be used with git checkout just as local branches can, but since the branches are locally read-only (they can only change due to commits on the remote repository) then checking them out will lead to a detached HEAD state. This means that HEAD is pointing directly at a commit rather than a mutable branch, so any git commit operations will be left dangling with no branch pointing to them. As a result, committing on a remote branch is not generally recommended (although Git won't prevent it).

If you have created a detached HEAD by committing when HEAD doesn't refer to a mutable branch, you can preserve the commit by creating a new branch for it with git checkout -b <branch> before moving HEAD (or after you move HEAD via the reflog for awhile, but it may removed during garbage collection once HEAD no longer refers to it).

To actually commit work on a remote branch, the best approach is to create a tracking branch locally — this is done by providing the remote branch as an extra argument to the git checkout command:

git checkout -b <name> <remote>/<branch>

It can be helpful to keep branch names consistent, and git checkout provides a shorthand for creating a tracking branch with the same name as a remote branch:

git checkout --track <remote>/<branch>

A tracking branch acts like a local branch but since it's implicitly tied to a remote branch then git push and git pull operations don't need additional arguments to operate as they default to the corresponding remote branch. Git will also show the relationship between the two branches in some cases — for example, a git status command on a remote tracking branch will also show how many commits the local tracking branch si ahead of or behind the remote branch.

Pulling

To fetch information about one or all remote repositories, the git fetch command:

git fetch [<remote>]

Without specifying a remote, changes from all remotes are fetched.

This command does not merge any local changes into the remotes — it simply updates the remote branches in the local repository to reflect the latest changes in the remote. Appropriate git merge or git rebase commands are then required to merge local changes.

To fetch and merge in a single operation, the git pull command can be used:

git pull [<remote>]

This is equivalent to:

git fetch <remote>
git merge FETCH_HEAD

Where FETCH_HEAD is a pointer to the HEAD of the most-recently fetched remote branch. If the --rebase option is used, then the latter command becomes git rebase instead.

Some people prefer to avoid git pull and use git fetch first so that they can examine any changes before merging them. If you use git pull, you'll most likely be forced to resolve any conflicts before continuing with your development.

It is also possible to provide the URL of a remote repository for git pull — this pulls the appropriate changes and merges them without saving a persistent reference to the remote repository:

git pull <url> [<ref>]

This can be useful for grabbing one-time changes. Typically in this case, the <ref> argument will be used to specify a remote branch or commit.

Pushing

To push changes out to a remote branch, use:

git push <remote> <branch>

The arguments may be omitted if the current branch is a tracking branch, and this would cause the upstream branch tied to the current tracking branch to be used. To set this upstream the git branch --set-upstream command can be used, but often it's more convenient to specify the -u option to git push:

git push -u <remote> <branch>

The <branch> specification above assumes that the branch name is the same on both repositories and is actually a shorthand for the full specification:

git push <remote> [+]<local branch>:<remote branch>

The <local branch> here is typically a local branch name, but could be any commit specification (e.g. a SHA1 hash). The <remote branch> must be a real branch, however, to avoid creation of a detached head, but if a non-existent branch is specified then it will be created based on the source branch.

It's worth noting that this point that you should avoid modifying changes (for example, via an interactive rebase) once they've been pushed or pulled to another repository unless you maintain both and are prepared to deal with the resultant conflicts.

Normally the push operation fails if the remote merge operation is anything other than a fast-forward (i.e. the local repository has already merged all of the remote changes). If this is not the case, it will exit with an appropriate error and the remote changes will need to be merged locally first. If the branch specification is prefixed with +, the update will be done even if the merge is not a fast-forward.

Performing non-fast-forward merges can cause changes to be lost in the remote repository, if those changes haven't been first merged locally — for this reason, the configuration option receive.denyNonFastForwards is often set on shared repositories to reject them.

Specifying a blank <local branch> causes the remote branch to be deleted (mnemonic: make the remote branch “nothing” to delete it):

git push <remote> :<remote branch>

Deleting remote branches can lose changes and also be used to implement non-fast-forward changes if otherwise denied (i.e. delete then replace branch) — for this reason, the configuration option receive.denyDeletes may be set on shared repositories to disallow deletion of branches or tags for all users. Hooks may also be used to implement more selective ACLs instead.

As a special case, leaving both <local branch> and <remote branch> empty will push changes between all “matching” branches — i.e. for each local branch, a remote branch of the same name is updated if it already exists.

Diffcore

Several parts of Git involve comparing two sets of files to show the differences in some form — the obvious commands are `git diff` and `git log`. All of these commands share a common subsystem for this task and this subsystem is described here.

The system operates in stages, the first of which is filtering by path. This limits the pairs of files on which the core operates and any paths outside the set are simply dropped from consideration at this point.

After that, a normal diff operation is performed and the list of differences is then fed through five more stages. These are summarised in the sections below, along with the command-line option which controls them.

diffcore-break

-B[<n>][/<m>]
--break-rewrites[=[<n>][/<m>]]

Detects major changes to whole files and reconsiders large-scale modification changes as a deletion followed by a separate addition, as opposed to smaller diffs between bits of context which happen to match. Often used with -M, in which case rewritten files can be used as the source of a rename (typically only deleted files are considered). See also diffcore-merge-broken.

The number <n> controls the percentage of file which must change to be considered a rewrite, where the sum of both deletions and additions is counted. For the meaning of <m> see diffcore-merge-broken.

diffcore-rename

-M[<n>]
--find-renames[=<n>]
-C[<n>]
--find-copies[=<n>]

Detect and report renaming (-M) and/or copying (-C) of a file on a per-commit basis (use --follow for extending history beyond renames). If specified, <n> is the percentage of the file which should be unchanged for the operation to be considered a copy/rename. Note that -C implies -M.

diffcore-merge-broken

Used with diffcore-break, this stage converts changes broken up by that stage but not matched by any renaming checks back into a single modification. The parameter <m> to the -B option specifies the maximum percentage of the original file which may be deleted for the change to be re-merged in this fashion.

diffcore-pickaxe

-S<string>

This stage removes diffs where the specified string occurs the same numbers of times in the source and destination. The --pickaxe-regex option interprets the string as a regex instead of a fixed string. The --pickaxe-all option is also relevant, which shows all changes in files which have at least one matching change as opposed to just the matching changes.

Note that this operation is subtly different from the -G option, which simply filters diffs based on a specified regular expression appearing in either source or destination lines.

diffcore-order

-O<file>

Re-orders diffs according to glob patterns specified in a file. The globs are matched against filenames in the diff set and diffs are ordered based on the first match in the file. Filenames which don't match are output last.

Coloured Output

The colour settings in Git can be customised using configurables under the color section — for example, color.diff.meta indicates the colour used for diff metainformation.

Each colour is specified as:

git config --global color.diff.meta "red black bold"

The only mandatory argument in the list is the first, which specifies the foreground colour. The second colour, if specified, indicates the background colour. These are from the following list: black, red, green, yellow, blue, magenta, cyan and white. The other argument specifies an optional attribute, which is one of: bold, dim, ul, blink and reverse.

The hierarchy is shown below:

color.branch Used when listing branches.
.current The current branch.
.local Non-current local branch.
.remote A remote tracking branch.
.plain Any other ref.
color.diff Used for diff and log output.
.plain Unchanged context lines in the diff.
.meta Per-file metainformation.
.frag Per-hunk header information (e.g. line numbers).
.func Function name within hunk header.
.old Lines which have been removed by the change.
.new Lines which have been added by the change.
.commit Commit headers.
.whitespace Highlight used for whitespace errors.
color.decorate Used for git log --decorate additions.
.branch Local branch.
.remoteBranch Remote tracking branch.
.tag Tag.
.stash Stash reference.
.HEAD HEAD reference.
color.grep Used for git grep output.
.context Non-matching context lines.
.filename Name of matching file (unless using -h).
.function Name of containing function (if using -p).
.linenumber Line number of match (if using -n).
.match Matching text within line.
.selected Non-matching text within matching lines.
.separator Separators between fields and hunks.
color.interactive Used for git add –interactive output.
.prompt Action prompts and letter codes.
.header Header lines.
.help Help text.
.error Error messages.
color.status Used for git status output.
.header Header lines.
.added Files which are added but not committed.
.updated As .added.
.changed Files which are modified but not in the index.
.untracked Files which are in the working directory but not in the index or repository.
.branch The current branch.
.nobranch The “no branch” warning.

Git Subversion

There exists a popular extension to Git to allow a Git repository to act as a front-end to a Subversion repository. This can be used for those who wish to use Git for their day-to-day operations but still need to commit files back to SVN periodically, or as a tool to transition from an SVN repository to Git without losing any commit history.

Initial Setup and Updates

The git-svn extension often needs to be installed — for example, on Ubuntu:

sudo apt-get install git-svn

As an alternative, it's included in recent versions of Git — see Installing Latest Git for details.

To start, create a clone of the remote SVN repository with git svn clone:

git svn clone svn+ssh://user@host/path -s

This operation may take hours on a large repository with a lot of history. If you need a faster clone, the -rN option can be used to start history at revision N. However, the history before this point will be permanently lost to the Git repository thus created.

The -s option assumes that the standard SVN subdirectories trunk, branches and tags exist and are used for their conventional purposes. If different directories are used, the -T, -b and -t options can be used to specify the trunk, branches and tags directories respectively.

The clone command is actually shorthand for a git svn init (which takes the options described above) followed by an immediate git svn fetch which copies commits from the SVN repository into Git. Since the initial clone can take hours or days for a large repository, it's possible to interrupt it and resume the fetch operation later.

Indeed, the fetch can be run at any point, to grab the latest commits from the SVN repository:

git svn fetch

This will cause Git to fetch every branch from the SVN repository into the appropriate remote tracking branches in the Git repository, but won't actually merge these changes into the current branch - this can be done after with a standard git rebase.

To combine the fetch and rebase into a single step:

git svn rebase

This is equivalent to performing two other operations:

  1. A git svn fetch –parent, which updates only the SVN parent of the current git branch from the source repository.
  2. A fast-forward of the current branch to the new state followed by a rebase of any changes not yet committed to the SVN repository.

Before carrying out this command, it's important to make sure there are no outstanding commits — if necessary, use git stash or git commit to move them out of the way.

Repository Layout

Both SVN branches and tags are represented as Git branches (i.e. SVN tags do not become Git tags). Unlike standard Git remote branches, the names of the branches are not prefixed with the name of the remote repository. For example, under a standard Git repository the output of git show-ref might be similar to:

83e38c7a0af325a9722f2fdc56b10188806d83a1 refs/heads/master
3e15e38c198baac84223acfc6224bb8b99ff2281 refs/remotes/gitserver/master
0a30dd3b0c795b80212ae723640d4e5d48cabdff refs/remotes/origin/master
25812380387fdd55f916652be4881c6f11600d6f refs/remotes/origin/testing

However, in a repository cloned from SVN then the branches appear as top-level remote branches, on the assumption that there won't be multiple remotes:

1cbd4904d9982f386d87f88fce1c24ad7c0f0471 refs/heads/master
aee1ecc26318164f355a883f5d99cff0c852d3c4 refs/remotes/my-calc-branch
03d09b0e2aad427e34a6d50ff147128e76c0e0f5 refs/remotes/tags/2.0.2
1cbd4904d9982f386d87f88fce1c24ad7c0f0471 refs/remotes/trunk

This caveat doesn't necessarily apply if the repository structure has been changed from its default configuration — see Nonstandard Subversion Configuration.

Nonstandard Subversion Configuration

Some SVN repositories have exceedingly non-standard layouts, such as nested trees for branches and multiple projects at the top level. Some of these issues can be dealt with by editing the configuration for the SVN repository prior to performing a git svn fetch — the sequence is typically:

  1. Perform git svn init <url> to create the blank repository.
  2. Edit the [svn-remote "svn"] section of the .git/config file produced.
  3. Go ahead and perform the git svn fetch operation.

The svn-remote config file section has four main configurables:

url Specifies the URL of the subversion repository, set by git svn init and left alone.
fetch Maps the remote SVN trunk to a remote tracking branch in Git.
branches Maps SVN branches to remote tracking branches in Git, may be specified multiple times.
tags As branches but maps tags, may be specified multiple times.

To illustrate some of these settings, let us assume a repository with the following structure:

  • projectA
    • trunk
    • branches
      • releases
        • 1.x
          • 1.0
          • 1.1
        • 2.x
          • 2.0
      • development
        • featureX
        • featureY
    • tags
      • 1.0-1
      • 1.0-2
      • 1.1-1
      • 1.2-1
      • 1.2-2
      • 2.0-1
  • projectB

The configuration examples below assume git-svn version 1.6.x or later, where support for multiple wildcards was added.

Assuming that the user wishes to create a Git repository which covers only Project A, the contents of the svn-remote configuration for this setup could be:

[svn-remote "svn"]
        url = svn+ssh://user@host/repo
        fetch = projectA/trunk:refs/remotes/svn/trunk
        branches = projectA/branches/releases/*/*:refs/remotes/svn/*
        branches = projectA/branches/development/*:refs/remotes/svn/development/*
        tags = projectA/tags/*:refs/remotes/svn/tags/*

This configuration places all the Subversion references under a common svn remote with the remote trunk in the remotes/svn/trunk branch. The set of branches in releases is reflected as Git branches under remotes/svn/ — for example, remotes/svn/1.x/1.1. The branches under development are reflected as, for example, remotes/svn/development/featureX. The tags are all mapped under a tags subdirectory — for example, remotes/svn/tags/1.2-1.

If the set of branches is later changed after the repository has already been fetched then the new branch will not be reflected. To fix this, edit .git/svn/.metadata to move branches-maxRev back to before the branch was created in the SVN repository and then re-run git svn fetch.

Git Subtree

Git has an inbuilt system for managing nested repositories called submodules. However, these have various issues and quirks which make them somewhat less than useful in some cases. As a replacement, the git subtree command was created which allows external repositories to be imported as a subdirectory of a current repository. Unlike submodules, these can be treated as standard files by those cloning the repository, but at the same time changes can be shared bidirectionally between the subdirectory and the external repository.

Installing Latest Git

Currently this feature is an extension which needs to be separately installed unless you're using a very recent version of Git. I would suggest using the current version, however, as the command is now included in the mainline tree and using the latest version will include all of the latest bug fixes and features. To do this on Ubuntu:

  1. Install some build dependencies of Git:
    sudo apt-get install libcurl4-gnutls-dev libexpat1-dev gettext libz-dev libssl-dev asciidoc
  2. In an appropriate location, clone the Git repository:
    git clone git://github.com/git/git.git
  3. Enter the working directory and view the list of tags:
    cd git
    git tag
  4. Select an appropriate tag — the Git homepage shows the current version. For the purposes of this example we'll use the tag v1.7.12.5.
  5. Checkout the tree to the selected tag:
    git checkout v1.7.12.4
  6. Perform a build targeted at /usr/local:
    make prefix=/usr/local all
  7. Assuming the build succeeded, install the resultant binaries:
    make prefix=/usr/local install
  8. Voila, assuming /usr/local/bin is earlier in your path than /usr/bin the newly-built version should be used in preference (hint: try running hash -r under bash).
  9. Now install the git subtree extension:
    cd contrib/subtree
    make
    sudo make install
    sudo mkdir /usr/local/share/man/man1
    sudo make install-doc

Adding and Updating Subtrees

There are a couple of prerequisites for adding a subtree:

  • You must be at the top level of the git repository (i.e. in the same place as the .git directory).
  • Your repository must be clean — you can use git stash to achieve this if necessary.

To add a new subtree first add a reference to the remote repository and fetch it locally:

git remote add <name> <url>
git fetch <name>

It's not strictly necessary to add the other repository as a remote, but it removes the need to remember the remote path and means that there's an authoritative local copy of the remote. The downside is that the local repository requires more space, as it contains every branch and tag from the remote one. To skip this, specify a remote URL instead of <name>/master in the commands which follow.

Now clone that repository into another directory in the local one:

git subtree add --prefix=path/in/local/repo --squash <name>/master

This will create a directory path/in/local/repo in the local repository and make its contents a clone of the specified remote repository's master branch. The --squash option squashes the history of the remote repository into a single local commit — this is often useful for keeping the history of the local repository clean, but is entirely optional.

To later merge changes from the remote repository into the local one:

git fetch <name>
git subtree merge --prefix=path/in/local/repo --squash <name>/master

Merging Changes Upstream

Once changes have been made in the local version of the subtree then it may be desirable to merge these back into the upstream repository. To do this the history of the repository must be filtered such that only changes to the subtree are included and the filenames are moved to the root of the repository. The git subtree split command allows this.

Based on the previous commands using a remote of <name> and a subtree in path/in/local/repo, then the following command will create a history suitable for merging into the upstream:

git subtree split --prefix=path/in/local/repo --branch=export

This creates a synthetic set of commits which are the filtered versions of those affecting the subtree in the repository. With the --branch option it also creates a new branch to track the head of this synthetic commit history.

The commits are identical across multiple split operations, so repeated splits can be used without risk of duplicating commits.

At this point to merge the changes into the upstream repository, perform a push from the local branch just created to the remote branch (assuming master in this example):

git push <name> export:master

Of course, any branch name can replace export locally, and it's also possible to push to a different remote branch if the changes must be adapted before merging.

Git Structure

This page is intended to be more of a “how-to” guide than an in-depth discussion of Git internals. Nevertheless, in a system as flexible as Git it's useful to know at least a little structure to put the operation of the commands into context. This section covers the basic concepts that expert Git users should expect to know (basic users can probably get away with less) using some basic commands as examples. The specifics of the commands will be covered in later sections, but the meaning should be relatively obvious to anybody who's used other source control systems.

this section presents something of an abstraction of the Git structure and some of the details may not be quite accurate. As with any complex system, the underlying specifics may vary.

A git repository can be thought of in three parts:

Object store This is a key/value data store which holds all important objects — files, commit comments, branches, everything. In essence, this is the repository.
Index This is a staging area which holds files which have been added but not yet committed.
Working directory This is the current checkout which holds files that the user can edit.

Starting with an empty repository the object store, index and working directory all contain no files. When a file is created and populated in the working directory, it can then be added via the git add command. At this point, the file's data is added to the object store (referenced by a SHA1 hash, like all Git objects) and a reference is added to the index to track the object. This file isn't yet in the repository proper, only in the index, but a copy is in the object store so further edits to the file in the working directory will not change the yet-to-be-committed file. If the git add command were to be repeated on the same file, the old copy of the file in the object store would be replaced by a new one and the index would be updated.

After the file has been added to the index, the situation would be something like this:

From this position, a git commit would create a tree object in the object store based on the current index — typically this would reference multiple files, but in this example only a single one has been added. It also creates a commit object to hold the commit comment, author, etc. and this references the tree:

Notice that the commit is tracked by a branch called master — this is the Git equivalent of “tip” or “top of tree” in other source control systems, although the name is really just a convention as “trunk” is in Subversion. As shown in the diagram, a branch in Git is simply a reference to a commit which represents the head of that branch. The HEAD object is a pointer to a branch which represents the currently checked out branch — i.e. that which is currently reflected in the working directory.

Now suppose that the working file was modified and git add run again — the situation would then become:

A git commit operation at this point would then create new commit and tree objects with the commit linked to the earlier one as its parent:

At this point it would be useful to briefly demonstrate a branch, although later sections will cover the process in more detail. If the user were to run git checkout -b mybranch, Git would create a new branch pointing at the same commit as the current HEAD and also move HEAD to point to it (i.e. change the working directory to be on that branch). Note that the files in the working directory won't change (assuming there were no uncommitted local modifications) because the two branches are currently identical:

Following the processes already outlined, committing files on to the new branch will add to the chain of commits but the master won't be changed:

Finally, a git checkout master followed by a commit on that shows how the chain of commits can fork:

Git Internals

This section contains more information on the internal structure of Git and its implementation. Most people won't need to consult this section except for interest or if they need to deal with Git at a low level (e.g. to repair a broken repository or debug problems).

The .git directory

The entire Git repository is stored in the .git directory of a standard repository, or simply at the top level of a bare repository. When initially created, it contains the following files and subdirectories:

HEAD Points to the currently checked out branch.
config This file contains repository-specific configuration options.
description This is used by the GitWeb interface.
branches/ A deprecated means of adding shorthand names for repository URLs.
hooks/ Client- or server-side hook scripts.
index/ Where staging area information is stored.
info/ Contains a global exclusions file for those which shouldn't be tracked in .gitignore.
objects/ Where the actual repository content is stored (file content, commit objects, etc.).
refs/ Pointers to commit objects in the object store.

More information on the repository structure can be found in the gitrepository-layout man page. A useful subset of these directories will be discussed in the following sections.

Object Store

The primary storage of a Git repository is the objects/ directory. Aside from pack and info (which will be mentioned later), subdirectories of this are the leading two characters from the SHA1 hashes of objects stored in the database.

To manually add objects to the store, the hash-object command can be used:

echo "sample content" | git hash-object -w --stdin

The -w option here instructs hash-object to actually store the file in the database, otherwise it simply displays what the SHA1 hash would have been. Looking inside the objects/ directory at this point will reveal the newly-created file. The cat-file command can then be used to dump out the object's contents:

git cat-file -p <hash>

Each object in the store has a type and the type of plain file content like this is a blob. The type of an object can be queried with:

git cat-file -t <hash>

Tree Objects

FIXME

Commit Objects

FIXME

Object File Format

FIXME

References

FIXME

Packing

FIXME

Sample Files

.gitconfig
        name = Andy Pearce
[merge]
        tool = vimdiff
        defaultToUpstream = true
[diff]
        tool = vimdiff
[color]
        ui = true
[color "branch"]
        current = green bold
        local = green
        remote = black bold
        plain = red
[color "diff"]
        plain = white
        meta = yellow blue bold
        frag = white blue bold
        func = white blue
        old = red
        new = green
        commit = yellow
        whitespace = red reverse
[color "decorate"]
        branch = green
        remoteBranch = black bold
        tag = red
        stash = blue bold
        HEAD = yellow bold
[color "grep"]
        context = cyan
        filename = magenta
        function = white blue bold
        linenumber = green
        match = yellow bold
        selected = white
        separator = cyan
[color "interactive"]
        prompt = white bold
        header = blue bold
        help = green
        error = red bold
[color "status"]
        header = blue bold
        added = green
        updated = green
        changed = red bold
        untracked = cyan
        branch = green bold
        nobranch = white red bold
[init]
        templatedir = /home/apearce16/.git_template
[alias]
        aliases = "!git config --list | grep ^alias"
        branches = for-each-ref --sort=-committerdate --format='%(color:cyan)|%(committerdate:relative)| %(color:yellow bold)%(refname:short)' refs/heads
        branchgraph = log --graph --abbrev-commit --decorate --simplify-by-decoration --date=relative --all --pretty=format:"%C(yellow)%h\\ %C(cyan)|%ad|%C(yellow)%C(bold)%d\\ %Creset%s%C(green)\\ [%an]%C(blue)%C(bold)\\ <%cn>"
        branchhistory = "!git showline HEAD; git showline @{-1}; git showline @{-2}; git showline @{-3}; git showline @{-4}; git showline @{-5}"
        changed = diff-tree -r --no-commit-id --name-only --relative
        changes = log --pretty=format:"%C(yellow)%h\\ %C(cyan)|%ad|%C(yellow)%C(bold)%d\\ %Creset%s%C(green)\\ [%an]%C(blue)%C(bold)\\ <%cn>" --decorate --stat --date=relative
        commits = log --pretty=format:"%C(yellow)%h\\ %C(cyan)|%ad|%C(yellow)%C(bold)%d\\ %Creset%s%C(green)\\ [%an]%C(blue)%C(bold)\\ <%cn>" --decorate --date=relative
        ctags = !.git/hooks/ctags
        fe = fetch --prune
        ff = merge --ff-only
        graph = log --graph --abbrev-commit --decorate --date=relative --all --pretty=format:"%C(yellow)%h\\ %C(cyan)|%ad|%C(yellow)%C(bold)%d\\ %Creset%s%C(green)\\ [%an]%C(blue)%C(bold)\\ <%cn>"
        grepall = "!git grep --full-name -n -p"
        showline = log --oneline --decorate -1
        summary = show --name-status
[push]
        default = upstream
[rebase]
        autosquash = true

[difftool]
        prompt = false

To install this globally, place it in the home directory and update the core.excludesfile configurable to point to it:

.gitignore
# Ignore compilation by-products
*.[oa]
*.so
*.pyc

# Ignore editor-created files
.*.swp
tags

Recipes

The following are some handy pre-canned commands for specific situations. You can set up aliases for these as well.

Housekeeping

A good couple of aliases to run in quick succession to fetch remotes with pruning, and then fast-forward the current branch if possible:

git fetch --prune
git merge --ff-only

Searching

Search the entire repository, regardless of the current subdirectory:

git grep --full-name -n -p <needle>

Listing commits

Show recent commits with relative dates and authors:

git log --decorate --date=relative \
        --pretty=format:"%C(yellow)%h %C(cyan)|%ad|%C(yellow)%C(bold)%d %Creset%s%C(green) [%an]%C(blue)%C(bold) <%cn>"

Show the history of a specified file:

git log -p -- <filename>

Show a graph of all commits:

git log --graph --abbrev-commit --decorate --date=relative --all \
        --pretty=format:"%C(yellow)%h %C(cyan)|%ad|%C(yellow)%C(bold)%d %Creset%s%C(green) [%an]%C(blue)%C(bold) <%cn>"

As above, but only commits which are the head of a branch:

git log --pretty=format:"%C(yellow)%h %C(cyan)|%ad|%C(yellow)%C(bold)%d %Creset%s%C(green) [%an]%C(blue)%C(bold) <%cn>" \
        --graph --abbrev-commit --decorate --simplify-by-decoration --date=relative --all 

Listing Files

Show files that changed between two commits in the current directory:

git diff-tree -r --no-commit-id --name-only --relative <commit1> <commit2>

Listing Branches

List branches in order of last commit time, most recent first (useful for removing dead branches):

git for-each-ref --sort=-committerdate \
                 --format='%(color:cyan)|%(committerdate:relative)| %(color:yellow bold)%(refname:short)' \
                 refs/heads
quickref/git.txt · Last modified: 2015/07/02 11:36 by andy