☑ The State of Python Coroutines: Introducing asyncio

16 Jun 2016 at 8:29AM in Software
 | 
Photo by Andy Pearce
 | 

I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers parts of the asyncio module that was added in Python 3.4.

This is the 2nd of the 4 articles that currently make up the “State of Python Coroutines” series.

python code

In the previous post I discussed the state of coroutines in Python 2.x and then the yield from enhancement added in Python 3.3. Since that release there’s been a succession of improvements for coroutines and in this post I’m going to discuss those that were added as part of the asyncio module.

It’s a pretty large module and covers quite a wide variety of functionality, so covering all that with in-depth discussion and examples is outside the scope of this series of articles. I’ll try to touch on the finer points, however — in this article I’ll discuss the elements that are relevant to coroutines directly and then in the following post I’ll talk about the IO aspects.

History of asyncio

Python 2 programmers may recall the venerable asyncore module, which was added way back in the prehistory of Python 1.5.2. Its purpose was to assist in writing endpoints that handle IO from sources such as sockets asynchronously. To create clients you derive your own class from asyncore.dispatcher and override methods to handle events.

This was a helpful module for basic use-cases but it wasn’t particularly flexible if what you wanted didn’t quite match its structure. Generally I found I just ended up rolling my own polling loop based on things from the select module as I needed them (although if I were using Python 3.4 or above then I’d prefer the selectors module).

If you’re wondering why talk of an old asynchronous IO module is relevant to a series on coroutines, bear with me.

The limitations of asyncore were well understood and several third party libraries sprang up as alternatives, one of the most popular being Twisted. However, it was always a little annoying that such a common use-case wasn’t well catered for within the standard library.

Back in 2011 PEP 3153 was created to address this deficiency. It didn’t really have a concrete proposal, however, it just defined the requirements — Guido addressed this in 2012 with PEP 3156 and the fledgling asyncio library was born.

The library went through some iterations under the codename Tulip and a couple of years later it was included in the standard library of Python 3.4. This was on a provisional basis — this means that it’s there, it’s not going away, but the core developers reserve the right to make incompatible changes prior to it being finalised.

OK, still not seeing the link with coroutines? Well, as well as handling IO asynchronously, asyncio also has a handy event loop for scheduling coroutines. This is because the entire library is designed for use in two different ways depending on your preferences — either a more traditional callback-based scheme, where callbacks are invoked on events; or with a set of coroutines which can each block until there’s IO activity for them to process. Even if you don’t need to do IO, the coroutine scheduler is a useful piece that you don’t need to build yourself.

asyncio as a scheduler

At this point it would be helpful to consider a quick example of what asyncio can do on the scheduling front without worrying too much about IO — we’ll cover that in the next post.

In the example below, therefore, I’ve implemented something like logrotate — mine is extremely simple1 and doesn’t run off a configuration file, of course, because it’s just for demonstration purposes.

First here’s the code — see if you can work out what it does, then I’ll explain the finer points below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import asyncio
import datetime
import errno
import os
import sys

def rotate_file(path, n_versions):
    """Create .1 .2 .3 etc. copies of the specified file."""

    if not os.path.exists(path):
        return
    for i in range(n_versions, 1, -1):
        old_path = "{0}.{1}".format(path, i - 1)
        if os.path.exists(old_path):
            os.rename(old_path, "{0}.{1}".format(path, i))
    os.rename(path, path + ".1")


@asyncio.coroutine
def rotate_by_interval(path, keep_versions, rotate_secs):
    """Rotate file every N seconds."""

    while True:
        yield from asyncio.sleep(rotate_secs)
        rotate_file(path, keep_versions)


@asyncio.coroutine
def rotate_daily(path, keep_versions):
    """Rotate file every midnight."""

    while True:
        now = datetime.datetime.now()
        last_midnight = now.replace(hour=0, minute=0, second=0)
        next_midnight = last_midnight + datetime.timedelta(1)
        yield from asyncio.sleep((next_midnight - now).total_seconds())
        rotate_file(path, keep_versions)


@asyncio.coroutine
def rotate_by_size(path, keep_versions, max_size, check_interval_secs):
    """Rotate file when it exceeds N bytes checking every M seconds."""

    while True:
        yield from asyncio.sleep(check_interval_secs)
        try:
            file_size = os.stat(path).st_size
            if file_size > max_size:
                rotate_file(path, keep_versions)
        except OSError as exc:
            if exc.errno != errno.ENOENT:
                raise


def main(argv):

    loop = asyncio.get_event_loop()
    # Would normally read this from a configuration file.
    rotate1 = loop.create_task(rotate_by_interval("/tmp/file1", 3, 30))
    rotate2 = loop.create_task(rotate_by_interval("/tmp/file2", 5, 20))
    rotate3 = loop.create_task(rotate_by_size("/tmp/file3", 3, 1024, 60))
    rotate4 = loop.create_task(rotate_daily("/tmp/file4", 5))
    loop.run_forever()


if __name__ == "__main__":
    sys.exit(main(sys.argv))

Each file rotation policy that I’ve implemented is its own coroutine. Each one operates independently of the others and the underlying rotate_file() function is just to refactor out the common task of actually rotating the files. In this case they all delegate their waiting to the asyncio.sleep() function as a convenience, but it would be equally possible to write a coroutine which does something more clever, like hook into inotify, for example.

You can see that main() just creates a bunch of tasks and plugs them into an event loop, then asyncio takes care of the scheduling. This script is designed to run until terminated so it uses the simple run_forever() method of the loop, but there are also methods to run until a particular coroutine completes or just wait for one or more specific futures.

Under the hood the @asyncio.coroutine decorator marks the function as a coroutine such that asyncio.iscoroutinefunction() returns True — this may be required for disambiguation in parts of asyncio where the code needs to handle coroutines differently from regular callback functions. The create_task() call then wraps the coroutine instance in a Task class — Task is a subclass of Future and this is where the coroutine and callback worlds meet.

An asyncio.Future represents the future result of an asynchronous process. Completion callbacks can be registered with it using the add_done_callback(). When the asynchronous result is ready then it’s passed to the Future with the set_result() method — at this point any registered completion callbacks are invoked. It’s easy to see, then, how the Task class is a simple wrapper which waits for the result of its wrapped coroutine to be ready and passes it to the parent Future class for invocation of callbacks. In this way, the coroutine and callback worlds can coexist quite happily — in fact in many ways the coroutine interface is a layer implemented on top of the callbacks. It’s a pretty crucial layer in making the whole thing cleaner and more manageable for the programmer, however.

The part that links it all together is the event loop, which asyncio just gives you for free. There are a few details I’ve glossed over, however, since it’s not too important for a basic understanding. One thing to be aware of is that there are currently two event loop implementations — most people will be using SelectorEventLoop, but on Windows there’s also the ProactorEventLoop which uses different underlying primitives and has different tradeoffs.

This scheduling may all seem simplistic, and it’s true that in this example asyncio isn’t doing anything hugely difficult. But building your own event loop isn’t quite as trivial as it sounds — there are quite a few gotchas that can trip you up and leave your code locked up or sleeping forever. This is particularly acute when you introduce IO into the equation, where there are some slightly surprising edge cases that people often miss such as handling sockets which have performed a remote shutdown. Also, this approach is quite modular and manages to produce single-threaded code where different asynchronous operations interoperate with little or no awareness of each other. This can also be achieved with threading, of course, but this way we don’t need locks and we can more or less rule out issues such as race conditions and deadlocks.

That wraps it up for this article. I’ll cover the IO aspects of ascynio in my next post, covering and comparing both the callback and coroutine based approaches to using it. This is particularly important because one area where coroutines really shine (vs threads) is where your application is primarily IO-bound and so there’s no need to explode over multiple cores.


  1. In just one example of many issues, for extra credit2 you might like to consider whay happens to the rotate_daily() implementation when it spans a DST change. 

  2. Where the only credit to which I’m referring are SmugPoints(tm): a currency that sadly only really has any traction inside the privacy of your own skull. 

The next article in the “State of Python Coroutines” series is The State of Python Coroutines: asyncio - Callbacks vs. Coroutines
Tue 5 Jul, 2016
16 Jun 2016 at 8:29AM in Software
 | 
Photo by Andy Pearce
 |