Recently I had cause to find out where a particular process is currently writing a file on MacOS and I wanted to describe how I went about it for reference.
Now I should point out at this stage that I’m very far from a MacOS expert. I know a few basics, but generally things are slick enough that I don’t tend to need to drop down to the terminal to do a lot. As a result, I’m still discovering little corners where MacOS either provides better tools than I’m used to on Linux, or has some quirky differences to how they work.
Disclaimer aside, here’s the deal. I had this process, which I knew was downloading a file. I knew it was a very large file; but I didn’t know where it was being downloaded to. I knew the end destination, but it rapidly became clear this process was downloading it to somewhere temporary, so it could presumably rename it into place later. I wanted to monitor the size of the file so I could see how far along it was, so I could figure out how long I was able to spend making a cup of tea. Important stuff.
My go-to solution to this issue on Linux would be to locate the PID, then just
ls -l /proc/<pid>/fs
. This lists all open filehandles for a process and
they’re shown as symlinks to the open file. Handy.
MacOS, sadly, does not have /proc
in any form. A little Googling around the
subject did turn up something called fs_usage
, however.
This is in the same vein as strace
on Linux, except it’s a little more specific.
I won’t go into full details, but suffice to say it logs all filesystem (and other)
activities on the machine. Or you can provide a PID or process name and it’ll focus
in on that.
So I ended up running something like this:
sudo fs_usage -f filesys pid 1234
This shows all filesystem events for PID 1234. The output you get looks a little like this:
18:02:54.019999 getattrlist /tmp/filename
18:02:54.020828 getattrlist /tmp/filename
18:02:54.020907 getxattr [ 93] /tmp/filename
18:02:54.022703 open F=24 (R_____) /tmp/filename
18:02:54.022706 fcntl F=24 <GETPATH>
18:02:54.022711 close F=24
18:02:54.025311 open F=24 (R_____) /tmp/filename
18:02:54.424061 write F=24 B=0x190d
I’ve simplified and truncated the output a little, but you get the idea.
This was great and a concrete step forward. You’ll notice, however, that the
trace for read()
and write()
calls doesn’t print the filename that’s being
manipulated. That’s probably because those calls operate only on a filehandle,
and this tool doesn’t want to delve into process state, it just wants to write
the parameters to the call out and get on with it.
That’s fine if, as in the trace above, you’ve captured the open()
call; you
can use that F=24
to link up the traces and figure out which file is being updated.
If, however, you come in halfway through then that’s not a lot of help; you’d need to persuade the process to close and re-open the file on demand, and that’s pretty hairy stuff.
What we need, then, is a way to look up this file descriptor 24 into a file path.
What we need, then is lsof
.
This is a sufficiently standard Unix tool that its own Wikipedia page1, so I won’t go into an in-depth discussion. Suffice to say that its core competency is listing the open files of processes, and that’s exactly what we need here.
You can invoke it as lsof -p 1234
and it will show something like this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
Python 5715 myuser cwd DIR 1,4 256 4297892359 /Users/myuser
Python 5715 myuser txt REG 1,4 51744 4304043073 /System/Library/…/Python.app/Contents/MacOS/Python
Python 5715 myuser txt REG 1,4 52768 4304046592 /System/Library/…/lib-dynload/_locale.so
Python 5715 myuser txt REG 1,4 63968 4304046668 /System/Library/…/lib-dynload/readline.so
Python 5715 myuser txt REG 1,4 973824 4304094362 /usr/lib/dyld
Python 5715 myuser 0u CHR 16,0 0t101338 903 /dev/ttys000
Python 5715 myuser 1u CHR 16,0 0t101338 903 /dev/ttys000
Python 5715 myuser 2u CHR 16,0 0t101338 903 /dev/ttys000
Python 5715 myuser 3r REG 1,4 6804 4297372781 /private/etc/passwd
In this example you can see that my Python
process has a cwd
entry which
corresponds to its current working directory (which it has open) as well as
txt
entries which correspond to the binary itself and various shared
libraries. This will be populating the
text segment.
Then we have four entries where the FD
column reads 0u
, 1u
, 2u
and 3r
.
The numbers represent the file descriptors of the open files within
the process and as those familiar with Unix will recognise the first three
correspond to standard input, output and error respectively. Perhaps slightly
oddly the process has all three open for both read and write, as indicated by
the u
suffix; since these are all open on the terminal device I can only
assume that this is just some quirk of the default way the OS creates the process.
The final file descriptor 3r
shows that the process has file /private/etc/passwd
open for reading (indicated by the r
suffix), which is exactly right. This was an
interactive Python process and I’d just run fd = open('/etc/passwd')
. You’ll
notice lsof
is giving us the real absolute path name; I’d opened /etc/passwd
but since on MacOS /etc
is a symlink to /private/etc
then the path that’s
reported above is in that destination folder.
So now we have all the pieces we need: we fun fs_usage
to find out the file
descriptors that a process is accessing and then we can map these to filenames
using lsof
.
Frankly there are probably slightly easier ways to solve the problem, but these are probably handy utilties for future reference so I don’t regret the path2 I took.
Mind you, that doesn’t necessarily mean it’s not obscure. There are also surprisingly extensive wikipedia pages for correct toilet roll orientation and animals with fraudulent diplomas. ↩
Pun very much intended, I’m afraid. Sorry about that. ↩