☑ Tracing MacOS Filesystem Events

11 Jun 2019 at 8:19AM in Software
｜   ｜

Recently I had cause to find out where a particular process is currently writing a file on MacOS and I wanted to describe how I went about it for reference.

Now I should point out at this stage that I’m very far from a MacOS expert. I know a few basics, but generally things are slick enough that I don’t tend to need to drop down to the terminal to do a lot. As a result, I’m still discovering little corners where MacOS either provides better tools than I’m used to on Linux, or has some quirky differences to how they work.

Disclaimer aside, here’s the deal. I had this process, which I knew was downloading a file. I knew it was a very large file; but I didn’t know where it was being downloaded to. I knew the end destination, but it rapidly became clear this process was downloading it to somewhere temporary, so it could presumably rename it into place later. I wanted to monitor the size of the file so I could see how far along it was, so I could figure out how long I was able to spend making a cup of tea. Important stuff.

My go-to solution to this issue on Linux would be to locate the PID, then just ls -l /proc/<pid>/fs. This lists all open filehandles for a process and they’re shown as symlinks to the open file. Handy.

MacOS, sadly, does not have /proc in any form. A little Googling around the subject did turn up something called fs_usage, however.

This is in the same vein as strace on Linux, except it’s a little more specific. I won’t go into full details, but suffice to say it logs all filesystem (and other) activities on the machine. Or you can provide a PID or process name and it’ll focus in on that.

So I ended up running something like this:

sudo fs_usage -f filesys pid 1234


This shows all filesystem events for PID 1234. The output you get looks a little like this:

18:02:54.019999  getattrlist                            /tmp/filename
18:02:54.020828  getattrlist                            /tmp/filename
18:02:54.020907  getxattr               [ 93]           /tmp/filename
18:02:54.022703  open              F=24       (R_____)  /tmp/filename
18:02:54.022706  fcntl             F=24  <GETPATH>
18:02:54.022711  close             F=24
18:02:54.025311  open              F=24       (R_____)  /tmp/filename
18:02:54.424061  write             F=24   B=0x190d


I’ve simplified and truncated the output a little, but you get the idea.

This was great and a concrete step forward. You’ll notice, however, that the trace for read() and write() calls doesn’t print the filename that’s being manipulated. That’s probably because those calls operate only on a filehandle, and this tool doesn’t want to delve into process state, it just wants to write the parameters to the call out and get on with it.

That’s fine if, as in the trace above, you’ve captured the open() call; you can use that F=24 to link up the traces and figure out which file is being updated.

If, however, you come in halfway through then that’s not a lot of help; you’d need to persuade the process to close and re-open the file on demand, and that’s pretty hairy stuff.

What we need, then, is a way to look up this file descriptor 24 into a file path.

What we need, then is lsof.

This is a sufficiently standard Unix tool that its own Wikipedia page1, so I won’t go into an in-depth discussion. Suffice to say that its core competency is listing the open files of processes, and that’s exactly what we need here.

You can invoke it as lsof -p 1234 and it will show something like this:

COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
Python  5715 myuser     cwd    DIR    1,4      256 4297892359 /Users/myuser
Python  5715 myuser     txt    REG    1,4    51744 4304043073 /System/Library/…/Python.app/Contents/MacOS/Python
Python  5715 myuser     txt    REG    1,4    52768 4304046592 /System/Library/…/lib-dynload/_locale.so
Python  5715 myuser     txt    REG    1,4   973824 4304094362 /usr/lib/dyld
Python  5715 myuser       0u   CHR   16,0 0t101338        903 /dev/ttys000
Python  5715 myuser       1u   CHR   16,0 0t101338        903 /dev/ttys000
Python  5715 myuser       2u   CHR   16,0 0t101338        903 /dev/ttys000
Python  5715 myuser       3r   REG    1,4     6804 4297372781 /private/etc/passwd


In this example you can see that my Python process has a cwd entry which corresponds to its current working directory (which it has open) as well as txt entries which correspond to the binary itself and various shared libraries. This will be populating the text segment.

Then we have four entries where the FD column reads 0u, 1u, 2u and 3r. The numbers represent the file descriptors of the open files within the process and as those familiar with Unix will recognise the first three correspond to standard input, output and error respectively. Perhaps slightly oddly the process has all three open for both read and write, as indicated by the u suffix; since these are all open on the terminal device I can only assume that this is just some quirk of the default way the OS creates the process.

The final file descriptor 3r shows that the process has file /private/etc/passwd open for reading (indicated by the r suffix), which is exactly right. This was an interactive Python process and I’d just run fd = open('/etc/passwd'). You’ll notice lsof is giving us the real absolute path name; I’d opened /etc/passwd but since on MacOS /etc is a symlink to /private/etc then the path that’s reported above is in that destination folder.

So now we have all the pieces we need: we fun fs_usage to find out the file descriptors that a process is accessing and then we can map these to filenames using lsof.

Frankly there are probably slightly easier ways to solve the problem, but these are probably handy utilties for future reference so I don’t regret the path2 I took.

1. Mind you, that doesn’t necessarily mean it’s not obscure. There are also surprisingly extensive wikipedia pages for correct toilet roll orientation and animals with fraudulent diplomas

2. Pun very much intended, I’m afraid. Sorry about that.

11 Jun 2019 at 8:19AM in Software
｜   ｜