Recently I had cause to find out where a particular process is currently writing a file on MacOS and I wanted to describe how I went about it for reference.
Now I should point out at this stage that I’m very far from a MacOS expert. I know a few basics, but generally things are slick enough that I don’t tend to need to drop down to the terminal to do a lot. As a result, I’m still discovering little corners where MacOS either provides better tools than I’m used to on Linux, or has some quirky differences to how they work.
Disclaimer aside, here’s the deal. I had this process, which I knew was downloading a file. I knew it was a very large file; but I didn’t know where it was being downloaded to. I knew the end destination, but it rapidly became clear this process was downloading it to somewhere temporary, so it could presumably rename it into place later. I wanted to monitor the size of the file so I could see how far along it was, so I could figure out how long I was able to spend making a cup of tea. Important stuff.
My go-to solution to this issue on Linux would be to locate the PID, then just
ls -l /proc/<pid>/fs. This lists all open filehandles for a process and
they’re shown as symlinks to the open file. Handy.
MacOS, sadly, does not have
/proc in any form. A little Googling around the
subject did turn up something called
This is in the same vein as
strace on Linux, except it’s a little more specific.
I won’t go into full details, but suffice to say it logs all filesystem (and other)
activities on the machine. Or you can provide a PID or process name and it’ll focus
in on that.
So I ended up running something like this:
sudo fs_usage -f filesys pid 1234
This shows all filesystem events for PID 1234. The output you get looks a little like this:
18:02:54.019999 getattrlist /tmp/filename 18:02:54.020828 getattrlist /tmp/filename 18:02:54.020907 getxattr [ 93] /tmp/filename 18:02:54.022703 open F=24 (R_____) /tmp/filename 18:02:54.022706 fcntl F=24 <GETPATH> 18:02:54.022711 close F=24 18:02:54.025311 open F=24 (R_____) /tmp/filename 18:02:54.424061 write F=24 B=0x190d
I’ve simplified and truncated the output a little, but you get the idea.
This was great and a concrete step forward. You’ll notice, however, that the
write() calls doesn’t print the filename that’s being
manipulated. That’s probably because those calls operate only on a filehandle,
and this tool doesn’t want to delve into process state, it just wants to write
the parameters to the call out and get on with it.
That’s fine if, as in the trace above, you’ve captured the
open() call; you
can use that
F=24 to link up the traces and figure out which file is being updated.
If, however, you come in halfway through then that’s not a lot of help; you’d need to persuade the process to close and re-open the file on demand, and that’s pretty hairy stuff.
What we need, then, is a way to look up this file descriptor 24 into a file path.
What we need, then is
This is a sufficiently standard Unix tool that its own Wikipedia page1, so I won’t go into an in-depth discussion. Suffice to say that its core competency is listing the open files of processes, and that’s exactly what we need here.
You can invoke it as
lsof -p 1234 and it will show something like this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME Python 5715 myuser cwd DIR 1,4 256 4297892359 /Users/myuser Python 5715 myuser txt REG 1,4 51744 4304043073 /System/Library/…/Python.app/Contents/MacOS/Python Python 5715 myuser txt REG 1,4 52768 4304046592 /System/Library/…/lib-dynload/_locale.so Python 5715 myuser txt REG 1,4 63968 4304046668 /System/Library/…/lib-dynload/readline.so Python 5715 myuser txt REG 1,4 973824 4304094362 /usr/lib/dyld Python 5715 myuser 0u CHR 16,0 0t101338 903 /dev/ttys000 Python 5715 myuser 1u CHR 16,0 0t101338 903 /dev/ttys000 Python 5715 myuser 2u CHR 16,0 0t101338 903 /dev/ttys000 Python 5715 myuser 3r REG 1,4 6804 4297372781 /private/etc/passwd
In this example you can see that my
Python process has a
cwd entry which
corresponds to its current working directory (which it has open) as well as
txt entries which correspond to the binary itself and various shared
libraries. This will be populating the
Then we have four entries where the
FD column reads
The numbers represent the file descriptors of the open files within
the process and as those familiar with Unix will recognise the first three
correspond to standard input, output and error respectively. Perhaps slightly
oddly the process has all three open for both read and write, as indicated by
u suffix; since these are all open on the terminal device I can only
assume that this is just some quirk of the default way the OS creates the process.
The final file descriptor
3r shows that the process has file
open for reading (indicated by the
r suffix), which is exactly right. This was an
interactive Python process and I’d just run
fd = open('/etc/passwd'). You’ll
lsof is giving us the real absolute path name; I’d opened
but since on MacOS
/etc is a symlink to
/private/etc then the path that’s
reported above is in that destination folder.
So now we have all the pieces we need: we fun
fs_usage to find out the file
descriptors that a process is accessing and then we can map these to filenames
Frankly there are probably slightly easier ways to solve the problem, but these are probably handy utilties for future reference so I don’t regret the path2 I took.