I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers parts of the
asyncio module that was added in Python 3.4.
This is part 2 of the “State of Python Coroutines” series of posts:
In the previous post I discussed the state of coroutines in
Python 2.x and then the
yield from enhancement added in Python 3.3. Since
that release there’s been a succession of improvements for coroutines and in
this post I’m going to discuss those that were added as part of the
It’s a pretty large module and covers quite a wide variety of functionality, so covering all that with in-depth discussion and examples is outside the scope of this series of articles. I’ll try to touch on the finer points, however — in this article I’ll discuss the elements that are relevant to coroutines directly and then in the following post I’ll talk about the IO aspects.
Python 2 programmers may recall the venerable
module, which was added way back in the prehistory of Python 1.5.2. Its
purpose was to assist in writing endpoints that handle IO from sources
such as sockets asynchronously. To create clients you derive your own class
asyncore.dispatcher and override methods to handle events.
This was a helpful module for basic use-cases but it wasn’t particularly
flexible if what you wanted didn’t quite match its structure. Generally
I found I just ended up rolling my own polling loop based on things from
select module as I needed them (although if I were using
Python 3.4 or above then I’d prefer the
If you’re wondering why talk of an old asynchronous IO module is relevant to a series on coroutines, bear with me.
The limitations of
asyncore were well understood and several third party
libraries sprang up as alternatives, one of the most popular being
Twisted. However, it was always a little annoying that such a
common use-case wasn’t well catered for within the standard library.
Back in 2011 PEP 3153 was created to address this deficiency.
It didn’t really have a concrete proposal, however, it just defined the
requirements — Guido addressed this in 2012 with PEP 3156 and
asyncio library was born.
The library went through some iterations under the codename Tulip and a couple of years later it was included in the standard library of Python 3.4. This was on a provisional basis — this means that it’s there, it’s not going away, but the core developers reserve the right to make incompatible changes prior to it being finalised.
OK, still not seeing the link with coroutines? Well, as well as handling IO
asyncio also has a handy event loop for scheduling
coroutines. This is because the entire library is designed for use in two
different ways depending on your preferences — either a more traditional
callback-based scheme, where callbacks are invoked on events; or with a set
of coroutines which can each block until there’s IO activity for them to
process. Even if you don’t need to do IO, the coroutine scheduler is a useful
piece that you don’t need to build yourself.
At this point it would be helpful to consider a quick example of what
can do on the scheduling front without worrying too much about IO — we’ll
cover that in the next post.
In the example below, therefore, I’ve implemented something like logrotate — mine is extremely simple1 and doesn’t run off a configuration file, of course, because it’s just for demonstration purposes.
First here’s the code — see if you can work out what it does, then I’ll explain the finer points below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
Each file rotation policy that I’ve implemented is its own coroutine. Each one
operates independently of the others and the underlying
function is just to refactor out the common task of actually rotating the files.
In this case they all delegate their waiting to the
as a convenience, but it would be equally possible to write a coroutine which
does something more clever, like hook into inotify,
You can see that
main() just creates a bunch of tasks and plugs them into
an event loop, then
asyncio takes care of the scheduling. This script is
designed to run until terminated so it uses the simple
of the loop, but there are also methods to
run until a particular coroutine completes
or just wait for one or more specific futures.
Under the hood the
marks the function as a coroutine such that
this may be required for disambiguation in parts of
asyncio where the
code needs to handle coroutines differently from regular callback functions.
create_task() call then wraps the coroutine instance
Task class —
Task is a subclass of
Future and this is where the coroutine and callback worlds meet.
asyncio.Future represents the future result of an asynchronous
process. Completion callbacks can be registered with it using the
add_done_callback(). When the
asynchronous result is ready then it’s passed to the
Future with the
set_result() method — at this point any registered
completion callbacks are invoked. It’s easy to see, then, how the
Task class is a simple wrapper which waits for the result of its
wrapped coroutine to be ready and passes it to the parent
for invocation of callbacks. In this way, the coroutine and callback
worlds can coexist quite happily — in fact in many ways the coroutine
interface is a layer implemented on top of the callbacks. It’s a pretty
crucial layer in making the whole thing cleaner and more manageable
for the programmer, however.
The part that links it all together is the event loop, which
gives you for free. There are a few details I’ve glossed over, however,
since it’s not too important for a basic understanding. One thing to be aware
of is that there are currently two event loop implementations — most
people will be using
on Windows there’s also the
which uses different underlying primitives and has different tradeoffs.
This scheduling may all seem simplistic, and it’s true that in this
doing anything hugely difficult. But building your own event loop isn’t
quite as trivial as it sounds — there are quite a few gotchas that can trip
you up and leave your code locked up or sleeping forever. This is particularly
acute when you introduce IO into the equation, where there are some slightly
surprising edge cases that people often miss such as handling sockets which
have performed a remote shutdown. Also, this approach is quite modular and
manages to produce single-threaded code where different asynchronous
operations interoperate with little or no awareness of each other. This
can also be achieved with threading, of course, but this way we don’t
need locks and we can more or less rule out issues such as race conditions
That wraps it up for this article. I’ll cover the IO aspects of
my next post, covering and comparing both the callback and coroutine
based approaches to using it. This is particularly important because one
area where coroutines really shine (vs threads) is where your application is
primarily IO-bound and so there’s no need to explode over multiple cores.
Where the only credit to which I’m referring are SmugPoints(tm): a currency that sadly only really has any traction inside the privacy of your own skull. ↩
This is part 2 of the “State of Python Coroutines” series of posts: