☑ What’s New in Python 3.9 - New Features

27 Oct 2022 at 7:14PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we move on to Python 3.9 and examine some of the major new features. These include type hinting generics in standard collections, string methods for stripping specified prefixes and suffixes from strings, extensions to function and variable annotations, and new modules for timezone information and topological sorting of graphs.

This is the 19th of the 22 articles that currently make up the “Python 3 Releases” series.

python 39

Slowly catching up to reality, I’m now taking a look at the new features added in Python 3.9. This was released on October 5th 2020, which is around two years ago as I write this and almost exactly a year after 3.8 was released, in keeping with the new 12-month release cycle.

This release has quite a few nice changes which improve the convenience of type hinting and add some useful new functionality. At long last the standard library now contains time zone information, for example, which has been an annoyance of mine for so long I can’t even remember when it first annoyed me.

One thing that’s worth noting is that Python 3.9 is the final release to offer a number of Python 2 compatibility layers which have been preserved thus far to ease migration. If you’re relying on these then you should have been seeing DeprecationWarning or PendingDeprecationWarning for them for awhile, so their removal shouldn’t come as a surprise unless you’ve not been testing your code for these1. In fact, some things have already been removed in 3.9 — you can find a full list in the release notes.

Let’s jump in and look at the changes, then — as usual, I’ll kick off with the core language changes.

Syntax Changes

In terms of new syntax, there are some changes to the dict built-in to allow union and merge operations, as well as some changes to allow decorator expressions to be more flexible. This release also adds the ability to use built-in collections directly for type hints without needing to pull equivalent classes from typing, and as part of this change some more types have been made generic, such as queue.Queue. We’ll look at all of these in more detail in the subsections below.

Dictionary Merge and Update Operators

The built-in set class has long supported the | and |= operators for a union operation.

>>> x = {1,2,3,4,5}
>>> x | {2,4,6,8,10}
{1, 2, 3, 4, 5, 6, 8, 10}
>>> x
{1, 2, 3, 4, 5}
>>> x |= {2,4,6,8,10}
>>> x
{1, 2, 3, 4, 5, 6, 8, 10}

In this release, the dict class acquires similar semantics with these same two operators, detailed in PEP 584. In previous releases there have been some ways to achieve this, the main ones being dict unpacking and the dict.update() method.

>>> x = {1: 11, 2: 22, 3: 33}
>>> # Use dict unpacking
>>> {**x, **{2: 22, 4: 44, 6: 66}}
{1: 11, 2: 22, 3: 33, 4: 44, 6: 66}
>>> # Use dict.update()
>>> x.update({2: 22, 4: 44, 6: 66})
>>> x
{1: 11, 2: 22, 3: 33, 4: 44, 6: 66}

Of these, dict unpacking is fairly abstruse and dict.update(), whilst fairly easy to understand, is only suitable for updating existing instances and not creating a new temporary with the merge of two other values.

To provide a more elegant solution, the dict class now supports the same | and |= operators to merge dictionaries.

>>> {1: 11, 2: 22} | {2: 222, 3: 333} | {3: 3333, 4: 4444}
{1: 11, 2: 222, 3: 3333, 4: 4444}
>>> x = {1: "one", 2: "two", 3: "three"}
>>> x |= {3: "trois", 4: "quatre", 5: "cinq"}
>>> x
{1: 'one', 2: 'two', 3: 'trois', 4: 'quatre', 5: 'cinq'}

This is very similar to the case with set, the main difference with dict being that since keys are associated with values then replacement of keys has more subtleties. The semantics that have been chosen are that in case of a conflict, the rightmost definition in the merge takes precendence — you can see several examples of this in the snippet above. This seems sensible, since it matches the existing behaviour of duplicate keys when specifying a dict using an iterable of 2-tuples.

>>> dict(((1, "one"), (2, "two"), (1, "un")))
{1: 'un', 2: 'two'}

One interesting point to note is that due to the fact that different types that compare equal hash to the same value, and hence replace each other in a dict assignment. As a result of this behaviour, the | operator is not commutative.

>>> x = {1: 111}
>>> x |= {1.0: 222}
>>> x
{1: 222}
>>> x |= {True: 333}
>>> x
{1: 333}
>>> {1: 111} | {1.0: 222} == {1.0: 222} | {1: 111}
False

Type Hinting Generics in Standard Collections

The next syntax change is a handy improvement to type hinting specified by PEP 585. Prior to Python 3.9, many type hints had to use objects from the typing module which mirrored real Python data types. As well as builtin types like list and dict, other common examples include the abstract base classes in the collections.abc module. Using these classes from typing was workable, but certainly not convenient.

As of Python 3.9, however, these types have been made generic and can be used directly in type hints instead of their proxies within typing. In Python 3.8, you can see that using typing works but the builtin types don’t:

Python 3.8.14 (default, Sep  6 2022, 23:26:50)
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import typing
>>> typing.List[str]
typing.List[str]
>>> typing.Dict[int, str]
typing.Dict[int, str]
>>> list[str]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable
>>> dict[int, str]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable

In Python 3.9, however, although typing still works, it’s generally suggested to use the type objects themselves:

Python 3.9.14 (main, Sep  6 2022, 23:29:09)
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import typing
>>> typing.List[str]
typing.List[str]
>>> list[str]
list[str]
>>> dict[int, str]
dict[int, str]

Using the typing versions won’t generate any deprecation warnings in Python itself, but type checkers may let you know if you’re using them. However, the typing versions are guaranteed to be included until at least the Python 3.9 end-of-life, currently scheduled for October 2025.

This may seem like a minor change, but I think it’s an important one for encouraging more people to benefit from type hinting, as it significantly lowers the learning curve.

Relaxed Decorator Grammar

In versions of Python prior to 3.9, the syntax for decorators was more limited than was strictly required. Decorators had to be a single qualified name, optionally followed by a single call.

# This was permitted
@foo.bar.baz()
def func(...):
    ...

# But these were invalid
@foo().bar()
def func(...):
    ...

@item_list[2].decorator
def func(...):
    ...

It turns out that there was no real technical reason for these requirements, it was just a “gut feeling” from Guido which spurred the restrictions. Since then there have been a number of requests to lift these restrictions, and in Python 3.9 this has finally happened under the auspices of PEP 614.

So, any expression is now a valid decorator, provided that it evaluates to a callable object which takes a single parameter and returns the decorated callable. In the example above, all three would be valid, presuming appropriate definitions of the referenced variables.

I suspect this change is a little esoteric to impact a lot of developers — where they do use decorators, I suspect they use them in a fairly simple way. Still, on the rare occasions the flexibility is needed, this change should remove the need for less elegant hacks.

Language Changes

As well as the syntax changes outlined above, there are couple of handy new methods on str, some small changes to async support, and collection of smaller improvements. The subsections below go over these in more detail.

Removing String Prefixes and Suffixes

Something that comes up fairly commonly with strings is the need to strip off leading and trailing parts. For example, a very common case is to call the strip() method on a string to strip leading and trailing whitespace. The strip() method can also remove other characters than whitespace, but it works at the character level — sometimes you need to remove more complex strings.

As an example, imagine you had some code which wanted to store data under a hostname. If provided with "andy-pearce.com" then it should use that directly, but if given "www.andy-pearce.com" it should strip off the leading "www.". This is straightforward in Python, but requires use of a string length which is inelegant.

def strip_www(hostname):
    if hostname.startswith("www."):
        return hostname[4:]
    else:
        return hostname

As of Python 3.9, however, there are two new string methods removeprefix() and removesuffix() which do precisely this — if they string starts or ends with the specified string, a copy with it stripped is returned; otherwise, the string is returned unchanged.

Using these, the function above becomes so simple, you probably wouldn’t even bother writing your own function for it:

def strip_www(hostname):
    return hostname.removeprefix("www.")

These may not be earth-shattering in their utility, but these operations are common enough that I definitely appreciate them being added. Some will no doubt cry that the re module can be used for this purpose, as it can for almost any string manipulation task. But to have to resort to regular expressions for such a simple manipulation is unpleasant — in the words of Jeff Atwood, regular expressions are like hot sauce, to be used in moderation and only when appropriate.

Also, as PEP 616 points out, those learning Python for the first time are encouraged to be sceptical of any use of manual slicing or indexing into a string, as often there’s a more elegent higher-level function which is preferable. These functions remove the need for one more of these few remaining common use-cases for manual slicing.

Coroutine Changes

There’s been a small change to prevent an issue with async generators. Consider the following script, taken from bpo-30773.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import asyncio

loop = asyncio.get_event_loop()


async def consumer():
    while True:
        await asyncio.sleep(0)
        message = yield
        print('received', message)


async def amain():
    agenerator = consumer()
    await agenerator.asend(None)

    fa = asyncio.create_task(agenerator.asend('A'))
    fb = asyncio.create_task(agenerator.asend('B'))
    await fa
    await fb


loop.run_until_complete(amain())

This creates an instance of the consumer async generator, and starts it by sending None. Then it creates two tasks which send values to the consumer, and we expect these both to be printed.

However, if you run this against previous Python versions you’ll see something unexpected:

$ python3.7 /tmp/broken_async.py
received A
received None

The key aspect here is the await on line 8 and if you remove that then you’ll see both values printed as you’d expect. This await is causing the two asend() calls to be running in parallel on the same generator, due to confusion about the currently running state, and it’s this which causes the issue. The fix has been to mark the generator as running over the course of the entire outer asend() call, and also athrow() and aclose(). This prevents multiple cases of these running in parallel over the same generator — if that happens, an exception is raised.

Here’s the output of running exactly the same script through Python 3.9.

$ python3.9 /tmp/broken_async.py
received A
Traceback (most recent call last):
  File "/Users/andy/broken_async.py", line 23, in <module>
    loop.run_until_complete(amain())
  File ".../3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/andy/broken_async.py", line 20, in amain
    await fb
RuntimeError: anext(): asynchronous generator is already running

As a side-effect of this, the ag_running attribute now more accurately reflects the running status of the generator.

Encoding Parameter Checks

It’s possible to specify character encodings via the encoding parameter in a few places, such as open(), str.encode() and bytes.decode(). It’s also possible to specify an error handler with the errors parameter, which controls what these routines will do in response to encoding/decoding errors.

With that said, I’ll tell you the code below has a bug. Take a moment to look and see whether you can see it, and further whether you’d catch it during either unit testing or a regular code review.

Would you find the bug?
def read_configuration_file(config_file_path):
    """Read and parse the config file and yield (key, value) pairs."""

    with open(config_file_path, "r", encoding="utf-8", errors="surrogate") as fd:
        for line in (l.split("#", 1)[0] for l in fd):
            if ":" in line:
                yield tuple(i.strip() for i in line.split(":", 1))

The bug in the above code is that it should be errors="surrogateescape"2 but I’d be sceptical of any code reviewer who claimed they’d spot that reliably, or the code author who claimed they’d always add unit test cases to provoke decoding errors and provoke the bug.

The challenge here is that errors is a plain str and so no type hints will help you. Further more, the invalid error handler will only be detected the first time there’s a decoding error — this is the sort of bug that could easily sneak through to production and only bite you weeks, months or even years later. In principle you could say the same about encoding, although that’s much more likely to be caught in even casual testing earlier.

As of Python 3.9, however, if you’re using the development mode introduced in Python 3.7 then these parameters will both be validated on the first instance of each such call. Compare the two invocations below, with and without development mode enabled.

$ python3.9 -c 'print("foo".encode(encoding="utf-8", errors="spam"))'
b'foo'
$ python3.9 -X dev -c 'print("foo".encode(encoding="utf-8", errors="spam"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
LookupError: unknown error handler name 'spam'

This also highlights the utility of running your unit test suites under development mode — if you’re not already doing so, as far as I can tell it’s a very easy change with no real downside that I can see. Even if your unit test coverage is very low, the additional checks performed in development mode amplify the value of them.

Absolute __file__ in __main__

Until now, the __file__ attribute of the __main__ module would be whatever filename was specified on the command-line of the Python interpreter. This was inconsistent with how it worked for all other modules, where __file__ is an absolute path, and caused annoyances with things like finding other files relative to the script file’s location.

In Python 3.9 this has been changed, and the __file__ of your script should always be an absolute path regardless of what was specified on the command-line.

test.py
print(f"Path: {__file__}")
Output
$ python3.8 test.py
Path: test.py
$ python3.9 test.py
Path: /Users/andy/test.py

New Modules

There are two new modules in Python 3.9, zoneinfo and graphlib. They’re both useful in their own way, but it’s the first of these which is much more impactful in my view.

zoneinfo

Einstein’s theories of relativity tells us that time is more complicated than we thought when objects are travelling at high velocities relative to each other. As any programmer who’s had to deal with time across time zones will know, however, it’s complicated enough even when you’re sitting still. If you’ve dealt with it long enough, you’ll know the only thing travelling at high velocity will be your keyboard out of the window in frustration.

Still, despite the fact that time zones appear to have been designed with annoying programmers as their primary goal, they are a fact of coding in the real world and we have to deal with them. Thankfully, this has now become a little more convenient with PEP 615 adding zoneinfo into the standard library.

Before I take a look at the module itself, you may wonder why I think this is a big deal, given the availability of support in third party libraries. Well, the first reason is that the existing libraries aren’t always great — the widely used pytz module has a number of annoying gotchas, for example. The dateutil.tz module is closer to Python’s own datetime module, but my second issue is that it creates external dependencies for what should be a standard feature — Python is intended as a batteries included language, and time zone handling is something I regard as fairly core.

In any case, let’s take a look at what zoneinfo offers us — succintly, it exposes the IANA time zone database via the existing datetime.tzinfo interface. What does this mean in real terms? Well, without any external libraries we can now do things like this.

>>> from datetime import datetime, timedelta
>>> from zoneinfo import ZoneInfo
>>>
>>> # This time is after British Summer Time (BST) ends
>>> x = datetime(2022, 10, 31, 18, 30, tzinfo=ZoneInfo("Europe/London"))
>>> str(x)
'2022-10-31 18:30:00+00:00'
>>> x.tzname()
'GMT'
>>>
>>> # But it's before US summer time ends
>>> y = x.astimezone(ZoneInfo("America/New_York"))
>>> str(y)
'2022-10-31 14:30:00-04:00'
>>> y.tzname()
'EDT'
>>>
>>> # Moving back 48 hours brings it into BST 
>>> z = x - timedelta(seconds=86400*2)
>>> str(z)
'2022-10-29 18:30:00+01:00'
>>> z.tzname()
'BST'
>>>
>>> # Note the timedelta is respected per local time not UTC
>>> x - z
datetime.timedelta(days=2)
>>> x.astimezone(timezone.utc) - z.astimezone(timezone.utc)
datetime.timedelta(days=2, seconds=3600)

This all seems fairly straightforward, with the potential exception of the point about timedelta being applied to local time and not UTC. What is perhaps more subtle, though not unexpected on reflection, is that the module respects the local time disambiguation introduced into Python 3.6. In the previous article where I discussed this, I used dateutil.tz to demonstrate it — here is the same example with a minor update to use zoneinfo instead.

>>> from datetime import datetime, timedelta, timezone
>>> from zoneinfo import ZoneInfo
>>>
>>> base_dt = datetime(2022, 10, 29, 23, 35, tzinfo=timezone.utc)
>>> for i in range(8):
...     ut = base_dt + timedelta(seconds = i * 30 * 60)
...     lt = ut.astimezone(ZoneInfo("Europe/London"))
...     print(f"UTC:{ut.time()} London:{lt.time()} {lt.tzname()} Fold:{lt.fold}")
...
UTC:23:35:00 London:00:35:00 BST Fold:0
UTC:00:05:00 London:01:05:00 BST Fold:0
UTC:00:35:00 London:01:35:00 BST Fold:0
UTC:01:05:00 London:01:05:00 GMT Fold:1
UTC:01:35:00 London:01:35:00 GMT Fold:1
UTC:02:05:00 London:02:05:00 GMT Fold:0
UTC:02:35:00 London:02:35:00 GMT Fold:0
UTC:03:05:00 London:03:05:00 GMT Fold:0
>>>

There are a few other details that are worth knowing. Firstly, a function zoneinfo.available_timezones() returns a set of the available timezone keys.

>>> import random
>>> import zoneinfo
>>>
>>> zones = zoneinfo.available_timezones()
>>> print("\n".join(random.sample(sorted(zones), 5)))
Africa/Lagos
Pacific/Marquesas
America/New_York
Pacific/Midway
Pacific/Bougainville

The next interesting facet is that the zones themselves act a little like singletons — if you pass the same value to the ZoneInfo constructor twice, you get objects which will compare the same with is. This only applies if the same name is passed — another city within the same timezone will not exhibit this behaviour. This is implemented by keeping a cache of extant values around for as long as any reference exists, but can be bypassed using the no_cache() constructor if you really want to.

>>> ZoneInfo("America/New_York") is ZoneInfo("America/New_York")
True
>>> ZoneInfo.no_cache("America/New_York") is ZoneInfo("America/New_York")
False

This is probably only of use during testing, but there is a rare edge case where it might be helpful for very long-running applications. If the timezone data on disk is updated whilst the application is running, the caching behaviour means that the old values will always be used even after the update. If you force a cache miss with no_cache(), however, you’ll get an entry created from the updated timezone data. There’s also a static ZoneInfo.clear_cache() method to wipe the cache clean, which is probably more appropriate for this purpose.

Speaking of the underlying timezone data, there are some subtleties worth knowing about there too. For easy of system administration, a system-installed database will be used if present — this improves the chances of receiving updates, as the time zone data updates more often than you’d think. The module looks for this database across a search path which you can examine (but not change) by reading zoneinfo.TZPATH.

In cases where there is no system database, such as on Windows which has its own system for timezone information, then there is a first party tzdata package which ships using PyPI. This does rather remove the advantage of being included in the core distribution, but since most Unix-like platforms already have the IANA package pre-installed it’s generally not an issue unless you need to support Windows.

If you really do need to change TZPATH — for example, to disable the system database and force use of the tzdata package — then you have three options:

  1. Recompile the Python binary with a different PYTHONTZPATH option.
  2. Run your code with the PYTHONTZPATH environment variable set to the new path.
  3. Use the zoneinfo.reset_tzpath() function to change the value at runtime.

If you really must use the final option, do bear in mind the caching means that you’ll need to clear the cache if you want to use the updated version. However, you’ll also need to solve the problem of clearing out any extant references that still exist anywhere else in your code. As usual, things are generally easier if you just choose to restart your service at least once a day or week instead of trying all these tricky runtime patches.

>>> import zoneinfo
>>>
>>> zoneinfo.ZoneInfo("Europe/London")
zoneinfo.ZoneInfo(key='Europe/London')
>>>
>>> # Clear out TZPATH to cut off system database
>>> zoneinfo.reset_tzpath(())
>>>
>>> # New zones now fail, but the cache remains
>>> zoneinfo.ZoneInfo("America/New_York")
Traceback (most recent call last):
    # ... Traceback omitted for brevity ...
ModuleNotFoundError: No module named 'tzdata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    # ... Traceback omitted for brevity ...
zoneinfo._common.ZoneInfoNotFoundError: 'No time zone found with key America/New_York'
>>> zoneinfo.ZoneInfo("Europe/London")
zoneinfo.ZoneInfo(key='Europe/London')
>>>
>>> # Clearing the cache removes cached instances too
>>> zoneinfo.ZoneInfo.clear_cache()
>>> zoneinfo.ZoneInfo("Europe/London")
Traceback (most recent call last):
    # ... Traceback omitted for brevity ...
ModuleNotFoundError: No module named 'tzdata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    # ... Traceback omitted for brevity ...
zoneinfo._common.ZoneInfoNotFoundError: 'No time zone found with key Europe/London'

So that’s all I’m going to cover in zoneinfo. Most people won’t need to worry too much about a lot of those wrinkles, but I always think it’s nice to know what facilities are there because you never quite know when they might be useful. Overall I’m really happy to see this addition to the library, it’s one more thing that makes Python such a convenient language.

graphlib

The other new module in this release is graphlib which provides facilities to manipulate graph-like structures. Well, I use the word “facilities” a little loosely — it currently contains exactly one facility, which is the TopologicalSorter class to perform a topological sort of a series of interconnected nodes.

If you’ve not come across a topological sort before, the way I always envisage it is with a series of tasks with dependencies. So, each task has a list of tasks which must be completed first — hopefully for at least some of the tasks this list is empty, however, otherwise it’s not possible to complete anything. This will be very familiar with you if you’ve ever used make.

What you’re interested in here is the execution order — the order in which you should execute tasks such that no task is executed before those on which it depends. It should always be possible to create such an ordering in a directed acyclic graph (DAG), where “acyclic” means it doesn’t have any cycles (i.e. loops).

To perform this sort, you construct an instance of the TopologicalSorter class, passing a mapping specifying the graph to its constructor. To illustrate this, I’ll be using the graph depicted below as an example. The arrows indicate a dependency, so node A depends on nodes B, D and F, and so on.

sample directed graph

First, we’ll see a simple case of perfoming an immediate topological sort and returning the entire sorted order.

>>> import graphlib
>>>
>>> graph = {"A": {"B", "D", "F"},
...          "B": {"C"},
...          "C": {"E"},
...          "D": {"E"},
...          "E": {"I"},
...          "F": {"E"},
...          "G": {"H"},
...          "H": {"I"},
...          "I": {"J"} }
>>> sorter = graphlib.TopologicalSorter(graph)
>>> tuple(sorter.static_order())
('J', 'I', 'E', 'H', 'C', 'D', 'F', 'G', 'B', 'A')

This is straightforward enough, but TopologicalSorter has some other tricks up its sleeve for more complex cases where graphs will be incrementally constructed and tasks may be executed in parallel.

First up, there is an add() method for adding a new node and its dependencies to the graph — this can also be used to add new dependencies to an existing node. This is often more convenient than building a single static dict[Hashable, set[Hashable]] and passing it in, as in the code example above.

Once the graph has been constructed, either statically or with one or more calls to add(), the prepare() method is called, which “freezes” the graph — calling add() after prepare() has been called will raise ValueError. This method also checks the graph for cycles and raises CycleError if any are found.

>>> sorter = graphlib.TopologicalSorter()
>>> sorter.add("A", "B", "D", "F")
>>> sorter.add("B", "C")
>>> # ... Lines omitted for brevity...
>>> sorter.add("H", "I")
>>> sorter.add("I", "J")
>>> # Now we create a cycle
>>> sorter.add("J", "G")
>>> sorter.prepare()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.9/graphlib.py", line 104, in prepare
    raise CycleError(f"nodes are in a cycle", cycle)
graphlib.CycleError: ('nodes are in a cycle', ['I', 'H', 'G', 'J', 'I'])

Instead of using static_order() as above, there is also a get_ready() method which returns all the nodes which currently have no unsatisfied dependencies. This will raise ValueError unless prepare() has been called on the graph first, however.

>>> # Uses the definition of "graph" from above
>>> sorter = graphlib.TopologicalSorter(graph)
>>>
>>> sorter.get_ready()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.9/graphlib.py", line 117, in get_ready
    raise ValueError("prepare() must be called first")
ValueError: prepare() must be called first
>>>
>>> sorter.prepare()
>>> sorter.get_ready()
('J',)
>>> sorter.get_ready()
()

As you can see, each node is only returned once — if no more tasks are available, an empty tuple is returned. To free up more nodes to execute the done() method can be used to remove a specific dependency. There’s also an is_active() method which returns True if either a call to get_ready() would return at least one task, or the done() method has not yet been called on all the tasks returned by get_ready() calls.

The code below shows a simple loop to execute all tasks in a prepared graph.

>>> while sorter.is_active():
...     for task in sorter.get_ready():
...         print(f"Executing {task=}")
...         sorter.done(task)
...
Executing task='J'
Executing task='I'
Executing task='E'
Executing task='H'
Executing task='C'
Executing task='D'
Executing task='F'
Executing task='G'
Executing task='B'
Executing task='A'

At first glance this may not seem to hold any advantages over the simpler static_order() method, but the real advantage comes with supporting parallel execution of tasks across a pool of asynchronous workers — these could be other threads, other processes, or even services on other hosts. Tasks that are ready to be executed would be placed on a work queue to be removed by workers, and finished tasks would be placed on a completion queue to free up more tasks. Something like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from graphlib import TopologicalSorter
from queue import Queue

def execute_tasks(sorter: TopologicalSorter,
                  work_queue: Queue,
                  completion_queue: Queue):
    while sorter.is_active():
        # Schedule execution of available tasks.
        for task in sorter.get_read():
            work_queue.put(task)
        # Wait for the next completed task.
        task = completion_queue.get()
        sorter.done(task)

This isn’t the sort of thing you come across every day, but when it does come up it’s really nice to have something in the library available to avoid you having to roll your own hastily knocked together implementation which probably suffers from holes in its cycle detection logic and the like.

The only slightly disappointing aspect from my perspective is that they chose to make the structure completely static — it’s not possible to add new nodes after creation. This is understandable, since it significantly complicates the implementation — there are policy decisions to be made about what happens if you add a new dependency to a task which has already been returned by get_ready(), for example. It would also have had unbounded storage requirements, if it had to remember ever task ever completed in case it appeared as a dependency in a task added much later.

That minor drawback aside, however, it’s functionality that I think has definitely earned its place in the library.

Conclusions

That’s it for this post — as is becoming my habit, I’ll write a second post soon3 which discusses the changes to existing modules in release 3.9. These seem to be a slightly more modest set than in some previous releases, but there are some concurrency improvements with changes to asyncio, concurrent.futures, and multiprocessing; networking features with enhancements to ipaddress, imaplib, and socket; and some additional OS features in os and pathlib.


  1. And without wishing to seem harsh, if that’s the case then it’s frankly entirely your own fault. Maintaining compatibility in the short term is important, but doing it forever is a fool’s errand — I’ve heard enough horror stories of what it did to the state of Microsoft’s codebase to be fairly sure of that. 

  2. Although there are plenty of stylistic issues which aren’t actually bugs that one could raise, such as cumbersomely compact code compromising comprehensibility. Hey, I like that — new rule: all code review comments must alliterate. 

  3. For some value of “soon”. 

This is the 19th of the 22 articles that currently make up the “Python 3 Releases” series.

27 Oct 2022 at 7:14PM in Software
 |   | 
Photo by Jan Kopriva on Unsplash