Git’s filename filtering surprisingly works as you’d expect. To an extent.
I recently had cause to find out about Git commits which affected files whose name matched a particular pattern. In a classic case of trying to be too clever, I hunted through all the documentation in a vain effort to find some option which would meet my requirements. You can filter by commit comments, by changes added or removed within the diff, by commit time or a bundle of other options, but I couldn’t find anything to do with the filename.
I was just about to ask a question on Stack Overflow when it suddenly occurred to me to simply try the standard commit filtering by filename but using a shell glob instead of a full filename, and I was surprised to see it simply worked — perhaps I shouldn’t have been.
So, if you want to find changes to, say, any file with a .txt
extension in
your repository you can use this:
git log -- '*.txt'
This behaviour is perhaps a little surprising, because that *
is matching any
character including slashes, which is why this pattern works recursively across
the repository. Intrigued as to what was going on I did a little judicious
ltracing and found that Git indeed calls fnmatch()
to
match filenames in this case. Further, it doesn’t pass any flags to the call -
in particular it doesn’t pass FNM_PATHNAME
, which would cause wildcards to
fail to match path separators.
The slightly quirky thing I then observed is that Git appears to notice whether
you’ve used any wildcards and decide whether or not to use fnmatch()
on this
basis, presumably to make operations not using wildcards faster. I tried
digging around in the source code and believe I’ve located where the matching
is done, the slightly fearsome-looking
tree_entry_interesting()
function. This calls into
git_fnmatch()
, which is a fairly thin wrapper around
fnmatch()
. Indeed, it’s easy to see that GFNM_PATHNAME
is not passed into
git_fnmatch()
which would otherwise be converted into FNM_PATHNAME
.
This glob matching behaviour is potentially quite useful, but it’s inconsistent
with standard shell glob matches which behave as if FNM_PATHNAME
is set. It
also makes it difficult to express things such as “match all text files in the
current directory only”.
I wonder how many people will find this behaviour confusing. Mind you, probably not a massive proportion of people once you’ve already removed everyone who finds everything else about Git confusing as well.