☑ Herding children with a python

It’s possible to manage multiple subprocesses in Python, but there are a few gotchas.

herding cattle

Happy Valentines Day!

Right, now we’ve got that out of the way…

The subprocess module in Python is a handy interface to manage forked commands. If you’re still using os.popen() or os.system() give yourself a sharp rap on the knuckles1 and go and read that Python documentation page right now. No really, I’ll wait.

One of the things that’s slightly tricky with subprocess, however, is the management of multiple child processes running in parallel. This is quite possible, but there are a few quirks.

There are two basic approaches you can take — using the select module to watch the various file descriptors, or fork two separate threads for each process, one for stdout and one for stderr. I’ll discuss the former option here because it’s a little lighter on system resources, especially where many processes are concerned. It works fine on POSIX systems2, but on Windows you may need to use threads3.

The first point is that you can’t use any form of blocking I/O when interacting with the processes, or one of the other processes may fill the pipe it’s using to communicate with the main process and hence block. This means that your main process needs to be continually reading from both stdout and stderr of each process, lest they suddenly produce reams of output, but it shouldn’t read anything unless it knows it won’t block. This is why either select.select() or select.poll() are used.

The second issue is that you have to use the underlying operating system I/O primitives such as os.read(), because the builtin Python ones have a tendency to loop round reading until they’ve got as many bytes as requested or they reach EOF. You could read 1 byte at a time, but that’s dreadfully inefficient.

The third issue is that you have to manage closing down processes properly and removing their file descriptors from your watched set or they may continually flag themselves as being read-ready.

The fourth issue, which is a bit of an obsure one, is that you may wish to consider processes which close either or both of stdout and stderr before terminating — you may not need to care about this.

Fortunately for all you lucky people, I’ve already written the ProcPoller class which handles all this — I wrote this some time ago, but I only just dug it up out of an old backup.

Essentially it works in a similar way to select.poll() — you create an instance of it, and then you call the run_command() method to fork off processes in the background. Each command is also passed a context object, which can be anything hashable (i.e. non-mutable). This is used as the index to refer to that command in future.

Once everything is running that you want to watch, you call the poll() method, optionally with a timeout if you like. This will watch both stdout and stderr of each process and pass anything from them into the handle_output() method, which you can override in a derived class for your own purposes.

The base class will collect this output into two strings per process in the output dictionary, which may be fine for commands which produce little output. For more voluminous quantities, or long-running processes, you’re better off overriding handle_output() to parse the data as you go. You can also return True from handle_output() to terminate the process once you have the result you need.

Do drop a comment in if you find this useful at any point, or if you need any help using it.

But seriously, whether you use this class or not, stop using os.system(). Pretty please!

  1. Preferably from a respected artist like Antony Carmichael

  2. A portable implementation should support both select.select() and select.poll() as some platforms only provide one of these. The implementation I link to below uses poll() only for simplicity, but I only ever intended this code to support Linux systems. If you look at the implementation of communicate() on POSIX systems in the subprocess library you’ll see an example of supporting both. 

  3. If you want to see how this might be done on Windows, take a look at the implementation of communicate() on Windows systems in the subprocess library. 

14 Feb 2013 at 11:15AM by Andy Pearce in Software  | Photo by Daniel Burka on Unsplash  | Tags: python subprocess  |  See comments

☑ SO_REUSEADDR on Windows

The SO_REUSEADDR option has quite different functionality on Windows than it does on Unix.

boy spying

Anybody who’s done more than a little work at the sockets layer will have encountered the SO_REUSEADDR socket option. For anybody who hasn’t, first a little background: when a TCP socket is closed, it’s kept lingering around for a little while in a state called TIME_WAIT. This is essentially a safeguard to prevent the port number being reused for another service until there can be a fair degree of confidence that there aren’t any outdated packets from the connection bouncing around that might get erroneously delivered to the new service, royally confusing everyone.

This option is great for client sockets because the number of outgoing ports is huge enough that typically it’s not a big deal having some sockets kicking around the place (it can cause issues on extremely busy systems that are making connections at a rapid rate, but we’ll gloss over that). It’s a bit more annoying for servers which listen for connections, however, since they typically need to re-acquire the same port number when they get restarted.

To work around this issue, the SO_REUSEADDR socket option was created which allows the port number to be reused by a new socket immediately. On Unix systems this means that if the old socket is in TIME_WAIT (and perhaps some of the other idle states) then a new socket can come along and steal the port number. Since the TIME_WAIT state is really just a precaution then this is generally pretty safe.

So far I’ve been discussing the behaviour of this option on Unix systems, but the same option also exists on Windows. However, I recently discovered that its operation is actually quite different. Instead of allowing a socket to “steal” the port number from an inactive previous one, it actually allows a socket to “steal” the port from any other socket, including actively listening ones. This could be fine if one socket is UDP and the other TCP, for example, but Windows will happily allow two or more active TCP sockets to share port numbers.

This MSDN page explains how the option works under Windows — the upshot is that if two sockets end up sharing a port then the behaviour of an incoming connection is undefined. Helpful.

As the article goes on to point out, this is a bit of a security problem — you can have a trusted service listening on a particular port and some other piece of malware can come along and silently steal the port, intercepting all the connections — this is a brilliant opportunity for a man-in-the-middle attack (admittedly requiring the ability to run code on the server machine).

Microsoft’s solution to this was not to change the operation of SO_REUSEADDR as you might expect, but to add a new option SO_EXCLUSIVEADDRUSE which explicitly prevents any other sockets listening on the same port, even if they use SO_REUSEADDR. Ah, why use one socket option when two will do?

Anyway, definitely something to bear in mind when writing Windows services (and, perhaps more importantly, when debugging them).

1 Feb 2013 at 12:42PM by Andy Pearce in Software  | Photo by Dmitry Ratushny on Unsplash  | Tags: windows sockets  |  See comments

☑ Sharing pthreads locks between processes

How to share pthreads primitives across processes.

share tomatoes

The POSIX threads library has some useful primitives for locking between multiple threads, primarily mutexes and condition variables.

Typically these are only effective to lock between threads within the same process. However, pthreads defines a PTHREAD_PROCESS_SHARED attribute for both of these primitives which can be used to specify that they should also be used between processes.

This attribute simply indicates that the primitives be used in a way which is compatible with shared access, however — application code must still arrange to store them in shared memory. This is fairly easily arranged with an anonymous mmap(), or shmget() and shmat() if your processes don’t have a parent/child relationship.

The use of shared memory makes things a little more complicated, which is disappointing, but it still seems to me that using these primitives to synchronise processes is probably still a little more elegant than using pipes or something similar (even if that’s probably a little more portable).

I’ve added a code example to my wiki which illustrates this. I’ve used an anonymous mmap() for shared memory — the previous revision of the page used System V shared memory, but since this isn’t cleaned up automatically on application exit then the mmap() approach is safer.

31 Jan 2013 at 1:02PM by Andy Pearce in Software  | Photo by Elaine Casap on Unsplash  | Tags: pthreads  linux  ipc posix  |  See comments

☑ apport in a storm

Ubuntu’s apport service is less then helpful for developers — learn how to disable it.

port storm

I’m not sure quite when it appeared, but Ubuntu 12.04 has a service called apport which is a bit of a pain.

It’s started by default and appears to attempt to collect crash dump information, presumably to send back to Ubuntu. I’ve seen little notification icons appearing at one time or another which I assume is this service doing its job. I’m not going to get into the potential privacy implications of sending core dumps of possibly commercial third party software (the rest of the system may filter the core dumps, I’m not sure), but if you’re a developer trying to get hold of core dumps this is annoying. I discovered this application when trying to figure out where my core dumps were disappearing to.

What happens is that the Upstart job for the service alters the value of /proc/sys/kernel/core_pattern, which changes where core dump files are stored, to pipe the core dump into /usr/share/apport/apport, which is a Python script. This script attempts to write the core dump to the local directory first, and then writes an annotated copy elsewhere, which is presumably picked up by some other background job which puts cutesy little exclamation marks into the notification area or something similar.

In principle, this isn’t all that terrible — after all, general users wouldn’t know a core dump if it bit them on the rear (and you really wouldn’t want a dump biting you in the rear). However, one rather crucial problem is that it ignores any previously set value of core_pattern and simply dumps its own value over it. The script always attempts to dump the core file into the current directory, regardless of the old value of core_pattern.

This is a bit of an issue because I use union mounts for building software, to avoid cluttering up my source directories with build artifacts, and these often confuse the kernel when it tries to generate core dumps. As a result, I change core_pattern to the following to store all core dumps in a central location:


This worked fine until I rebooted for some reason and suddenly apport stuck its ugly nose in.

Anyway, fortunately it’s pretty easy to disable. First edit /etc/default/apport and set enabled to 0:

# set this to 0 to disable apport, or to 1 to enable it
# you can temporarily override this with
# sudo service apport start force_start=1

Then, just stop the service:

sudo service apport stop

At this point you’ll want to check that it’s restored the value of core_pattern to something sensible, and update it yourself if not.

That’s one more developer annoyance in Ubuntu taken care of.

29 Jan 2013 at 1:16PM by Andy Pearce in Software  | Photo by Anna Goncharova on Unsplash  | Tags: linux  ubuntu debugging  |  See comments

☑ Bash expansion weirdness

Expanding a substring of “$*” in bash seems to magically add command-line parameter zero.

inflatable flamingo

Here’s a quirky one. In the bash shell, "$*" expands to a single string which is a whitespace-separated list of arguments starting at one. So if you have the following script as script.sh:

echo Arguments: "$*"

… and you call it as script.sh arg1 arg2 arg3 then you’ll get the following output:

Arguments: arg1 arg2 arg3

So far so tedious. However, if we use some of bash’s parameter expansion rules to select a substring (in this case the substring starting at zero, or the whole string in fact):

echo Arguments: "${*:0}"

… then suddenly you get parameter zero (the filename of the script) included:

Arguments: ./script.sh arg1 arg2 arg3

I’m not sure if that’s a bug or I’m missing some subtlety and it’s expected behaviour, but it’s one of those issues that’s pretty tough to Google around so I suspect it’ll remain a mystery.

25 Jan 2013 at 11:12AM by Andy Pearce in Software  | Photo by Vicko Mozara  | Tags: linux bash  |  See comments

⇐ Page 1   |   ← Page 5   |   Page 6 of 6