It’s possible to manage multiple subprocesses in Python, but there are a few gotchas.
Happy Valentines Day!
Right, now we’ve got that out of the way…
subprocess module in
Python is a handy interface to manage forked commands. If you’re still using
os.system() give yourself a sharp rap on the knuckles1
and go and read that Python documentation page right now. No really, I’ll wait.
One of the things that’s slightly tricky with
subprocess, however, is the
management of multiple child processes running in parallel. This is quite
possible, but there are a few quirks.
There are two basic approaches you can take — using the
select module to watch the
various file descriptors, or fork two separate threads for each process, one
stdout and one for
stderr. I’ll discuss the former option here because
it’s a little lighter on system resources, especially where many processes are
concerned. It works fine on POSIX systems2, but on Windows you may need
to use threads3.
The first point is that you can’t use any form of blocking I/O when
interacting with the processes, or one of the other processes may fill the pipe
it’s using to communicate with the main process and hence block. This means
that your main process needs to be continually reading from both
stderr of each process, lest they suddenly produce reams of output, but it
shouldn’t read anything unless it knows it won’t block. This is why either
select.poll() are used.
The second issue is that you have to use the underlying operating system I/O
primitives such as
os.read(), because the builtin Python ones have a tendency
to loop round reading until they’ve got as many bytes as requested or they
reach EOF. You could read 1 byte at a time, but that’s dreadfully inefficient.
The third issue is that you have to manage closing down processes properly and removing their file descriptors from your watched set or they may continually flag themselves as being read-ready.
The fourth issue, which is a bit of an obsure one, is that you may wish to
consider processes which close either or both of
terminating — you may not need to care about this.
Fortunately for all you lucky people, I’ve already written the
which handles all this — I wrote this some time ago, but I only just dug it up
out of an old backup.
Essentially it works in a similar way to
select.poll() — you create an
instance of it, and then you call the
run_command() method to fork off
processes in the background. Each command is also passed a
which can be anything hashable (i.e. non-mutable). This is used as the index to
refer to that command in future.
Once everything is running that you want to watch, you call the
optionally with a timeout if you like. This will watch both
stderr of each process and pass anything from them into the
method, which you can override in a derived class for your own purposes.
The base class will collect this output into two strings per process in the
output dictionary, which may be fine for commands which produce little
output. For more voluminous quantities, or long-running processes, you’re
better off overriding
handle_output() to parse the data as you go. You can
handle_output() to terminate the process once you
have the result you need.
Do drop a comment in if you find this useful at any point, or if you need any help using it.
But seriously, whether you use this class or not, stop using
A portable implementation should support both
select.poll() as some platforms only provide one of these. The implementation
I link to below uses
poll() only for simplicity, but I only ever intended
this code to support Linux systems. If you look at the implementation of
communicate() on POSIX
subprocess library you’ll see an example of supporting both. ↩
If you want to see how this might be done on Windows, take a
look at the implementation of
communicate() on Windows
subprocess library. ↩