☑ Git in “almost does what you expect” shocker

18 Mar 2013 at 12:01PM in Software
 |   | 

Git’s filename filtering surprisingly works as you’d expect. To an extent.

index drawers

I recently had cause to find out about Git commits which affected files whose name matched a particular pattern. In a classic case of trying to be too clever, I hunted through all the documentation in a vain effort to find some option which would meet my requirements. You can filter by commit comments, by changes added or removed within the diff, by commit time or a bundle of other options, but I couldn’t find anything to do with the filename.

I was just about to ask a question on Stack Overflow when it suddenly occurred to me to simply try the standard commit filtering by filename but using a shell glob instead of a full filename, and I was surprised to see it simply worked — perhaps I shouldn’t have been.

So, if you want to find changes to, say, any file with a .txt extension in your repository you can use this:

git log -- '*.txt'

This behaviour is perhaps a little surprising, because that * is matching any character including slashes, which is why this pattern works recursively across the repository. Intrigued as to what was going on I did a little judicious ltracing and found that Git indeed calls fnmatch() to match filenames in this case. Further, it doesn’t pass any flags to the call - in particular it doesn’t pass FNM_PATHNAME, which would cause wildcards to fail to match path separators.

The slightly quirky thing I then observed is that Git appears to notice whether you’ve used any wildcards and decide whether or not to use fnmatch() on this basis, presumably to make operations not using wildcards faster. I tried digging around in the source code and believe I’ve located where the matching is done, the slightly fearsome-looking tree_entry_interesting() function. This calls into git_fnmatch(), which is a fairly thin wrapper around fnmatch(). Indeed, it’s easy to see that GFNM_PATHNAME is not passed into git_fnmatch() which would otherwise be converted into FNM_PATHNAME.

This glob matching behaviour is potentially quite useful, but it’s inconsistent with standard shell glob matches which behave as if FNM_PATHNAME is set. It also makes it difficult to express things such as “match all text files in the current directory only”.

I wonder how many people will find this behaviour confusing. Mind you, probably not a massive proportion of people once you’ve already removed everyone who finds everything else about Git confusing as well.

18 Mar 2013 at 12:01PM in Software
 |   | 
Photo by Sanwal Deen on Unsplash