It’s possible to manage multiple subprocesses in Python, but there are a few gotchas.
Happy Valentines Day!
Right, now we’ve got that out of the way…
The subprocess
module in
Python is a handy interface to manage forked commands. If you’re still using
os.popen()
or os.system()
give yourself a sharp rap on the knuckles1
and go and read that Python documentation page right now. No really, I’ll wait.
One of the things that’s slightly tricky with subprocess
, however, is the
management of multiple child processes running in parallel. This is quite
possible, but there are a few quirks.
There are two basic approaches you can take — using the
select
module to watch the
various file descriptors, or fork two separate threads for each process, one
for stdout
and one for stderr
. I’ll discuss the former option here because
it’s a little lighter on system resources, especially where many processes are
concerned. It works fine on POSIX systems2, but on Windows you may need
to use threads3.
The first point is that you can’t use any form of blocking I/O when
interacting with the processes, or one of the other processes may fill the pipe
it’s using to communicate with the main process and hence block. This means
that your main process needs to be continually reading from both stdout
and
stderr
of each process, lest they suddenly produce reams of output, but it
shouldn’t read anything unless it knows it won’t block. This is why either
select.select()
or select.poll()
are used.
The second issue is that you have to use the underlying operating system I/O
primitives such as os.read()
, because the builtin Python ones have a tendency
to loop round reading until they’ve got as many bytes as requested or they
reach EOF. You could read 1 byte at a time, but that’s dreadfully inefficient.
The third issue is that you have to manage closing down processes properly and removing their file descriptors from your watched set or they may continually flag themselves as being read-ready.
The fourth issue, which is a bit of an obsure one, is that you may wish to
consider processes which close either or both of stdout
and stderr
before
terminating — you may not need to care about this.
Fortunately for all you lucky people, I’ve already written the ProcPoller
class
which handles all this — I wrote this some time ago, but I only just dug it up
out of an old backup.
Essentially it works in a similar way to select.poll()
— you create an
instance of it, and then you call the run_command()
method to fork off
processes in the background. Each command is also passed a context
object,
which can be anything hashable (i.e. non-mutable). This is used as the index to
refer to that command in future.
Once everything is running that you want to watch, you call the poll()
method,
optionally with a timeout if you like. This will watch both stdout
and
stderr
of each process and pass anything from them into the handle_output()
method, which you can override in a derived class for your own purposes.
The base class will collect this output into two strings per process in the
output
dictionary, which may be fine for commands which produce little
output. For more voluminous quantities, or long-running processes, you’re
better off overriding handle_output()
to parse the data as you go. You can
also return True
from handle_output()
to terminate the process once you
have the result you need.
Do drop a comment in if you find this useful at any point, or if you need any help using it.
But seriously, whether you use this class or not, stop using os.system()
.
Pretty please!
Preferably from a respected artist like Antony Carmichael. ↩
A portable implementation should support both select.select()
and
select.poll()
as some platforms only provide one of these. The implementation
I link to below uses poll()
only for simplicity, but I only ever intended
this code to support Linux systems. If you look at the implementation of
communicate()
on POSIX
systems
in the subprocess
library you’ll see an example of supporting both. ↩
If you want to see how this might be done on Windows, take a
look at the implementation of communicate()
on Windows
systems
in the subprocess
library. ↩