I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers parts of the asyncio
module that was added in Python 3.4.
This is the 2nd of the 4 articles that currently make up the “State of Python Coroutines” series.
In the previous post I discussed the state of coroutines in
Python 2.x and then the yield from
enhancement added in Python 3.3. Since
that release there’s been a succession of improvements for coroutines and in
this post I’m going to discuss those that were added as part of the
asyncio
module.
It’s a pretty large module and covers quite a wide variety of functionality, so covering all that with in-depth discussion and examples is outside the scope of this series of articles. I’ll try to touch on the finer points, however — in this article I’ll discuss the elements that are relevant to coroutines directly and then in the following post I’ll talk about the IO aspects.
Python 2 programmers may recall the venerable asyncore
module, which was added way back in the prehistory of Python 1.5.2. Its
purpose was to assist in writing endpoints that handle IO from sources
such as sockets asynchronously. To create clients you derive your own class
from asyncore.dispatcher
and override methods to handle events.
This was a helpful module for basic use-cases but it wasn’t particularly
flexible if what you wanted didn’t quite match its structure. Generally
I found I just ended up rolling my own polling loop based on things from
the select
module as I needed them (although if I were using
Python 3.4 or above then I’d prefer the selectors
module).
If you’re wondering why talk of an old asynchronous IO module is relevant to a series on coroutines, bear with me.
The limitations of asyncore
were well understood and several third party
libraries sprang up as alternatives, one of the most popular being
Twisted. However, it was always a little annoying that such a
common use-case wasn’t well catered for within the standard library.
Back in 2011 PEP 3153 was created to address this deficiency.
It didn’t really have a concrete proposal, however, it just defined the
requirements — Guido addressed this in 2012 with PEP 3156 and
the fledgling asyncio
library was born.
The library went through some iterations under the codename Tulip and a couple of years later it was included in the standard library of Python 3.4. This was on a provisional basis — this means that it’s there, it’s not going away, but the core developers reserve the right to make incompatible changes prior to it being finalised.
OK, still not seeing the link with coroutines? Well, as well as handling IO
asynchronously, asyncio
also has a handy event loop for scheduling
coroutines. This is because the entire library is designed for use in two
different ways depending on your preferences — either a more traditional
callback-based scheme, where callbacks are invoked on events; or with a set
of coroutines which can each block until there’s IO activity for them to
process. Even if you don’t need to do IO, the coroutine scheduler is a useful
piece that you don’t need to build yourself.
At this point it would be helpful to consider a quick example of what asyncio
can do on the scheduling front without worrying too much about IO — we’ll
cover that in the next post.
In the example below, therefore, I’ve implemented something like logrotate — mine is extremely simple1 and doesn’t run off a configuration file, of course, because it’s just for demonstration purposes.
First here’s the code — see if you can work out what it does, then I’ll explain the finer points below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
Each file rotation policy that I’ve implemented is its own coroutine. Each one
operates independently of the others and the underlying rotate_file()
function is just to refactor out the common task of actually rotating the files.
In this case they all delegate their waiting to the asyncio.sleep()
function
as a convenience, but it would be equally possible to write a coroutine which
does something more clever, like hook into inotify,
for example.
You can see that main()
just creates a bunch of tasks and plugs them into
an event loop, then asyncio
takes care of the scheduling. This script is
designed to run until terminated so it uses the simple run_forever()
method
of the loop, but there are also methods to
run until a particular coroutine completes
or just wait for one or more specific futures.
Under the hood the @asyncio.coroutine
decorator
marks the function as a coroutine such that
asyncio.iscoroutinefunction()
returns True
—
this may be required for disambiguation in parts of asyncio
where the
code needs to handle coroutines differently from regular callback functions.
The create_task()
call then wraps the coroutine instance
in a Task
class — Task
is a subclass of
Future
and this is where the coroutine and callback worlds meet.
An asyncio.Future
represents the future result of an asynchronous
process. Completion callbacks can be registered with it using the
add_done_callback()
. When the
asynchronous result is ready then it’s passed to the Future
with the
set_result()
method — at this point any registered
completion callbacks are invoked. It’s easy to see, then, how the
Task
class is a simple wrapper which waits for the result of its
wrapped coroutine to be ready and passes it to the parent Future
class
for invocation of callbacks. In this way, the coroutine and callback
worlds can coexist quite happily — in fact in many ways the coroutine
interface is a layer implemented on top of the callbacks. It’s a pretty
crucial layer in making the whole thing cleaner and more manageable
for the programmer, however.
The part that links it all together is the event loop, which asyncio
just
gives you for free. There are a few details I’ve glossed over, however,
since it’s not too important for a basic understanding. One thing to be aware
of is that there are currently two event loop implementations — most
people will be using SelectorEventLoop
, but
on Windows there’s also the ProactorEventLoop
which uses different underlying primitives and has different tradeoffs.
This scheduling may all seem simplistic, and it’s true that in this
example asyncio
isn’t
doing anything hugely difficult. But building your own event loop isn’t
quite as trivial as it sounds — there are quite a few gotchas that can trip
you up and leave your code locked up or sleeping forever. This is particularly
acute when you introduce IO into the equation, where there are some slightly
surprising edge cases that people often miss such as handling sockets which
have performed a remote shutdown. Also, this approach is quite modular and
manages to produce single-threaded code where different asynchronous
operations interoperate with little or no awareness of each other. This
can also be achieved with threading, of course, but this way we don’t
need locks and we can more or less rule out issues such as race conditions
and deadlocks.
That wraps it up for this article. I’ll cover the IO aspects of ascynio
in
my next post, covering and comparing both the callback and coroutine
based approaches to using it. This is particularly important because one
area where coroutines really shine (vs threads) is where your application is
primarily IO-bound and so there’s no need to explode over multiple cores.
In just one example of many issues, for extra credit2 you might like to consider whay happens to the rotate_daily()
implementation when it spans a DST change. ↩
Where the only credit to which I’m referring are SmugPoints(tm): a currency that sadly only really has any traction inside the privacy of your own skull. ↩