☑ What’s New in Python 3.8 - New Features

In this series looking at features introduced by every version of Python 3, we move on to Python 3.8 and see what new features have been added in this release. These features include assignment as an expression, position-only parameters and two new modules in the standard library.

This is the 17th of the 34 articles that currently make up the “Python 3 Releases” series.

python 38

Since before I started this series of articles I always had it in my head that 3.8 was a really big release. I don’t quite know where I got this idea from — perhaps it was simply because that was the newest release around the time I had the idea to write this series of articles. Still, I thought before I jump in I’d try to gather some data to see if I was right

I didn’t want to do any complicated processing of source code branches, so I went with a rather more pragmatic1 solution:

$ curl https://docs.python.org/3/whatsnew/3.0.html | html2text | wc

I’ve no idea how scientific this is, but I figured it was better than a total guess — I think that’s what you might call damning with faint praise. Still, for what it’s worth here’s what I got:

Bar chart of release notes length for each Python release

Don’t read too much into the accuracy — it’s quantised to the nearest 5K characters2 but you can see that my intuition about 3.8 is rather misguided. In fact, the release notes length has been remarkably consistent, all things considered.

Anyway, that brief digression aside let’s look at what did make it into 3.8, released back in October 2019. That makes it approximately as old as COVID-19. Now I come to think of it, that’s just blown my mind a little — I hadn’t really quite grasped how much time had passed!

There were a few new features in this release, including a new operator for assignment in expressions, a syntax to force parameters to be passed by position and not keyword, and a useful addition to f-strings. There are new modules for querying package metadata and for managing shared memory when using multiprocessing. Plus there are the usual array of library enhancements, such as more asyncio changes, more mathematical functions in math and statistics, some useful changes to functools, and a number of additional facilities in typing.

In this article we’ll focus on the new language features and modules, then in the following article we’ll run through the changes to existing modules.

Assignment Expressions

First up we have a new operator := which has been dubbed the “walrus operator”, for reasons which are readily evident. This operator is used for assignment except that it also acts like an expression — behaviour that will be quite familiar to users of C/C++ or any language which inherited this trait.

This is typically useful where you want to assign a result to a variable, but also compare that value to something at the same time. Here’s an example using re:

>>> import re
>>> regex = re.compile("^Hello (?P<name>\w+)")
>>> if (match := regex.match("Hello Andy")) is not None:
...     print(match.group("name"))
...
Andy

You’ll note there are some odd brackets around the expression there — that’s because := has been added right at the bottom of the precedence table, so it binds less tightly than any other operator. If we didn’t use the brackets, it would be equivalent to match := (regex.match(...) is not None) which would assign a bool to match which is clearly not what we want in that example. Note that := even binds less tightly than and and or, the form of if used in a conditional expression, and lambda. You can imagine how any of these might trip you up, so my advice is get into the habit of always using brackets with := to make the precedence explicit3.

Another example of where this could help readability is when reading file data in fixed-sized chunks:

with open(filename, "rb") as input_fd:
    while (data := input_fd.read(8192)):
        // Process chunk of data

It’s also potentially useful when you need to calculate the same expression in both the filtering condition and the expression body. Imagine you wanted to iterate across all non-empty lines in a text file with #-style comments stripped out, and all lines forced to lowercase:

with open(filename, "r") as input_fd:
    for text_line in (text.lower() for line in input_fd 
                      if (text := line.split("#", 1)[0].strip())):
        ...

This is getting a little contrived, but that’s just from my lack of imagination — I’ve definitely come across all of these types of issues and been slightly irritated at what would be a very natural construct to write in C++.

However, you should be careful how you use this operator — it can make things concise, but brevity and readability aren’t always the best of friends. For example, take a look at the code below and see how quickly you can figure out what it’s doing — the overuse of assignment operators here is just one of several issues which make this code a little hard to grasp at first glance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import sys

with open(sys.argv[1], "r") as one_fd, open(sys.argv[2], "r") as two_fd:
    matches = 0
    while True:
        while (
            (one := one_fd.readline().strip())
            and (two := two_fd.readline().strip())
            and one == two
        ):
            matches += 1
        sys.stdout.write(f"{matches} matching\n" if matches else "")
        diffs = [(one, two)]
        while (
            (one := one_fd.readline().strip())
            and (two := two_fd.readline().strip())
            and one != two
        ):
            diffs.append((one, two))
        sys.stdout.write(f"{num_diffs} differ\n" if (num_diffs := len(diffs)) else "")
        sys.stdout.write("".join(f"<<< {x}\n>>> {y}\n" for x, y in diffs))
        if not one or not two:
            break
        matches = 1

If you want plenty more details on the justifications for adding this operator, and the minutiae of how it interacts with other constructs, have a read of PEP 572.

Position-Only Parameters

In the section on function parameter changes way back in the first first article in this series, I looked at how including the * character in function signatures caused parameters after it to become keyword-only — i.e. they could not be specified as positional parameters.

In this release, the / operator has been put to use for the opposite purpose — parameters before this can only be specified by position, and not as keyword arguments.

You can see the two of them in use in this trivial example:

>>> def func(one, two, /, three, four, *, five, six):
...     print(one, two, three, four, five, six)
...
>>> func(1, two=2, three=3, four=4, five=5, six=6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: func() got some positional-only arguments passed as keyword arguments: 'two'
>>> func(1, 2, 3, 4, 5, six=6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: func() takes 4 positional arguments but 5 positional arguments (and 1 keyword-only argument) were given
>>> func(1, 2, 3, four=4, five=5, six=6)
1 2 3 4 5 6

In this example one and two can only be specified as positional, five and six can only be keyword, and three and four can be either.

There aren’t too many cases where I think this would make a big difference, but there may be times when you’d like to make sure someone provides important parameters positionally because you don’t want to guarantee to maintain the keyword names in future. Or sometimes the keyword name may just be confusing to the reader.

One thing that’s perhaps not immediately obvious is any parameter names which are positional only will no longer clash with keyword parameters if you’re using the **kwargs varargs syntax. This could be useful if you’d like to avoid your fixed parameters clashing with the namespace of your keyword ones. In the example below you can see that the names one and two are effectively used twice — once for positional parameters, and then again as keyword arguments.

>>> def func(one, two, /, **kwargs):
...     print(one, two, kwargs)
...
>>> func("hello", "goodbye", one=1, two=2, three=3)
hello goodbye {'one': 1, 'two': 2, 'three': 3}
>>>

For a discussion of lots of interactions and edge cases, take a look at PEP 570.

Self-Documenting Expressions in f-strings

You might recall from an earlier article on Python 3.6 we looked at the f-strings formatting mechanism. In Python 3.8 a new format specifier has been added to this, primarily for debugging purposes.

The new = specifier includes the expression itself before the value, separated by an equals sign. The usual f-string formatters are all still valid, and the = sign should always immediately follow the expression with other specifiers after it as normal. Here’s a simple illustration:

>>> name="andy"
>>> age=43
>>> f"{name.title()=!s} {age=:#x}"
'name.title()=Andy age=0x2b'

Although simple, this is a really handy feature for diagnostic log files. Increasingly systems like Splunk and Humio will automatically parse key=value fields from log lines, and this syntax makes it particularly painless to generate log lines in that form. Of course, to be suitable for these tools you’d want to keep your expressions to simple variable names, but often when you’re adding debug logging then this will be the case anyway.

continue in finally

For obscure implementation reasons, the continually keyword has never been permitted within a finally clause. Here’s an example from Python 3.7:

Python 3.7.10 (default, Mar 28 2021, 04:19:36)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(-5, 6):
...     try:
...         print(10 / i)
...     finally:
...         continue
...
  File "<stdin>", line 5
SyntaxError: 'continue' not supported inside 'finally' clause

In Python 3.8, however, this restriction has been lifted. For example, note how the code below glides right past the ZeroDivisionError exception that’s triggered at one point:

Python 3.8.8 (default, Mar 28 2021, 04:22:11)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(-5, 6):
...     try:
...         print(10 / i)
...     finally:
...         continue
...
-2.0
-2.5
-3.3333333333333335
-5.0
-10.0
10.0
5.0
3.3333333333333335
2.5
2.0

This also illustrates an important point, however — using finally blocks in this way can disguise exceptions. You should generally either combine then with an except clause in the same block, or check sys.exc_info() yourself within the finally clause and at least log the exception if you’re going to continue and ignore it.

dict Comprehension Order

In a dictionary comprehension in earlier Python versions there was an odd quirk where value expressions were potentially evaluated before key expressions. This differed from dict literals, where the order was key first, as you might reasonably expect.

You can see this behaviour in 3.7 below:

Python 3.7.10 (default, Mar 28 2021, 04:19:36)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> # Literal
>>> x = {input("Key: "): input("Value: ")}
Key: A
Value: 65
>>> # Comprehension
>>> y = {input("Key: "): input("Value: ") for i in range(3)}
Value: 65
Key: A
Value: 66
Key: B
Value: 67
Key: C

Normally this wouldn’t make any difference, but if those expressions had side-effects then this could become relevant. In particular as assignment expressions have been added in this release, as discussed earlier in this article, it’s becoming more likely people might run into this odd behaviour — the assignment always needs to go in the expression which is executed first if it’s going to be referred to in the other expression.

As a result in Python 3.8 the order of evaluation for comprehensions has been swapped over to be consistent with that for literals. Here’s the same code as above, but run under Python 3.8:

Python 3.8.8 (default, Mar 28 2021, 04:22:11)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> # Literal
>>> x = {input("Key: "): input("Value: ")}
Key: A
Value: 65
>>> # Comprehension
>>> y = {input("Key: "): input("Value: ") for i in range(3)}
Key: A
Value: 65
Key: B
Value: 66
Key: C
Value: 67

New Modules

This release contains two new modules — one for querying metadata of installed packages, the other for managing System V style shared memory.

importlib.metadata

Installed packages typically provide metadata compliant with PEP 566 or earlier standards. Some of this can be queried when a package is imported, but sometimes it’s useful to perform queries before importing — this is what importlib.metadata is for.

As with importlib.resources in the previous release, this module is part of an initiative to deprecate the older and less efficient pkg_resources module which was distributed as part of setuptools.

I’m not going to pass myself off as an expert on the details of package metadata — I tend to just follow templates and use the simplest of metadata such as version, author, description and so on. But this module seems potentially pretty useful for debugging problems with package installation among other things, so I’ll run through the highlights. If you want more details on Python packaging, check out the excellent Python Packaging User Guide.

To query metadata about an installed package, you just need to know the package name, same as you use to import it. There are a series of methods which take the package name as an argument and return the metadata. The simplest of these is the current package version, returned as a string by version().

>>> import importlib.metadata
>>> importlib.metadata.version("black")
'22.3.0'

The dependencies of a module can be queried with requires() — this returns a set of version specifications, which is a surprisingly rich format. If you want the full details, take a read of PEP 508 — maybe get a coffee first.

>>> import pprint
>>> pprint.pprint(importlib.metadata.requires("black"))
['click (>=8.0.0)',
 'platformdirs (>=2)',
 'pathspec (>=0.9.0)',
 'mypy-extensions (>=0.4.3)',
 'typing-extensions (>=3.10.0.0) ; python_version < "3.10"',
 'tomli (>=1.1.0) ; python_version < "3.11"',
 'dataclasses (>=0.6) ; python_version < "3.7"',
 'typed-ast (>=1.4.2) ; python_version < "3.8" and implementation_name == '
 '"cpython"',
 "colorama (>=0.4.3) ; extra == 'colorama'",
 "aiohttp (>=3.7.4) ; extra == 'd'",
 "ipython (>=7.8.0) ; extra == 'jupyter'",
 "tokenize-rt (>=3.2.0) ; extra == 'jupyter'",
 "uvloop (>=0.15.2) ; extra == 'uvloop'"]

To list the files which make up a distribution, there’s the files() function. This returns a series of PackagePath instances, which are derived from pathlib.PurePath. They’re all relative paths to the site-packages directory in which they’re located.

>>> pprint.pprint(importlib.metadata.files("black"))
[PackagePath('../../../bin/black'),
 PackagePath('../../../bin/blackd'),
 PackagePath('610faff656c4cfcbb4a3__mypyc.cpython-38-darwin.so'),
 PackagePath('__pycache__/_black_version.cpython-38.pyc'),
 PackagePath('_black_version.py'),
 PackagePath('black-22.3.0.dist-info/AUTHORS.md'),
 PackagePath('black-22.3.0.dist-info/INSTALLER'),
 PackagePath('black-22.3.0.dist-info/LICENSE'),
 PackagePath('black-22.3.0.dist-info/METADATA'),
 PackagePath('black-22.3.0.dist-info/RECORD'),
 PackagePath('black-22.3.0.dist-info/REQUESTED'),
 PackagePath('black-22.3.0.dist-info/WHEEL'),
 PackagePath('black-22.3.0.dist-info/entry_points.txt'),
 PackagePath('black-22.3.0.dist-info/top_level.txt'),
 PackagePath('black/__init__.cpython-38-darwin.so'),
 PackagePath('black/__init__.py'),
 PackagePath('black/__main__.py'),
 PackagePath('black/__pycache__/__init__.cpython-38.pyc'),
 ...

As well as acting like pathlib.PurePath, this class adds size and hash attributes, which allow those metadata values to be queried, and a dist attribute that represents the whole distribution with an appropriate subclass of a Distribution object. The PackagePath also has a locate() method which returns the absolute location of the file.

>>> version_py = importlib.metadata.files("black")[4]
>>> version_py
PackagePath('_black_version.py')
>>> version_py.locate()
PosixPath('/Users/andy/.pyenv/versions/python-3.8/lib/python3.8/site-packages/_black_version.py')
>>> version_py.size
19
>>> import os
>>> os.stat(version_py.locate()).st_size
19
>>> version_py.hash
<FileHash mode: sha256 value: flAPrVYQWnjwdFb3cjH7xDLpMyYTZy40uOslXE0nuQc>
>>> import hashlib, base64
>>> with open(version_py.locate(), "rb") as input_fd:
...     print(base64.b64encode(hashlib.sha256(input_fd.read()).digest()))
...
b'flAPrVYQWnjwdFb3cjH7xDLpMyYTZy40uOslXE0nuQc='

Finally, the function importlib.metadata.entry_points() returns, you guessed it, a list of the registered entry points. This returns a dict where the keys are the entry point types (e.g. console_scripts, distutils.commands) and the values each a tuple of the entry points of that type.

Each entry point is represented by an EntryPoint object which has fields name, value and group, where the last of those is same as the key to the dict in which the entry is found. The name field is whatever the package author chose, and the value field is the standard format for the entry point — for example, package.module:function.

I think that’s enough to give you a flavour of the functionality, but of course there are more details to be found in the documentation.

multiprocessing.shared_memory

This module provides facilities which allow simple management of System V style shared memory blocks — that is to say blocks of volatile memory which are mapped into the address space of multiple processes and can be accessed by any of them. Whether or not the underlying shared memory actually uses the System V API is platform-dependent, however.

The fundamental class offered is SharedMemory which represents a block of shared memory. Processes must choose to whether to create a new block or attach to an existing one, and set the create parameter appropriately. When creating a new block the size parameter must be specified, and a name parameter can be specified to identify the block, or if not provided a random name is chosen.

Here’s some code that briefly illustrates the usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from multiprocessing import shared_memory
import os
import random
import sys
import time

SHMEM_NAME = "andy"

def run_child(num):
    shm = shared_memory.SharedMemory(name=SHMEM_NAME)
    try:
        print(f"[CHILD {num}] Memory: {list(shm.buf[:10])}")
    finally:
        print(f"[CHILD {num}] Closing shared memory")
        shm.close()

def main():

    children = []
    for i in range(3):
        pid = os.fork()
        if pid == 0:
            time.sleep(1)
            run_child(i)
            sys.exit(0)
        else:
            children.append(pid)

    shm = shared_memory.SharedMemory(name=SHMEM_NAME, create=True, size=128)
    try:
        shm.buf[:10] = bytearray(random.randint(0, 255) for i in range(10))
        for pid in children:
            os.waitpid(pid, 0)
    finally:
        print("[PARENT] Closing shared memory")
        shm.close()
        shm.unlink()

if __name__ == "__main__":
    main()

When you run this, you should see some output similar to this:

[CHILD 0] Memory: [253, 117, 221, 65, 8, 250, 3, 41, 76, 196]
[CHILD 0] Closing shared memory
[CHILD 1] Memory: [253, 117, 221, 65, 8, 250, 3, 41, 76, 196]
[CHILD 1] Closing shared memory
[CHILD 2] Memory: [253, 117, 221, 65, 8, 250, 3, 41, 76, 196]
[CHILD 2] Closing shared memory
[PARENT] Closing shared memory

As far as I can tell from the documentation this should be all fine, but there’s an issue that I haven’t been able to resolve — when I run this code on both MacOS and Linux, I get these warnings like these as each process terminates:

/usr/lib/python3.8/multiprocessing/resource_tracker.py:216:
    ... UserWarning: resource_tracker: There appear to be
    ... 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.8/multiprocessing/resource_tracker.py:229:
    ... UserWarning: resource_tracker: '/andy': [Errno 2]
    ... No such file or directory: '/andy'
  warnings.warn('resource_tracker: %r: %s' % (name, e))

Frankly I’m a bit puzzled by this as it seems to me I’m doing things exactly as I should, but apparently the operation of resource_tracker.py is written with some other behaviour in mind. From reading through bpo-38119 it seems like there’s some known messiness here, so hopefully things will be cleared up at some point.

There’s also a ShareableList class which presents a list-like abstraction over the shared memory buffer, with some notable limitations such as the length and maximum size of each item being fixed. This is probably a more convenient interface. You can create a new list by passing an iterable to provide the initial values, or you can provide the name parameter to attach to an existing list.

>>> from multiprocessing import shared_memory
>>> shared_list = shared_memory.ShareableList(["one", "two", "three"])
>>> foo = shared_memory.ShareableList(name=shared_list.shm.name)
>>> foo[1] = "deux"
>>> list(shared_list)
['one', 'deux', 'three']

There’s also a new SharedMemoryManager in multiprocessing.managers which is a derivation of the multiprocessing.BaseManager. This makes it easier to manage your shared memory blocks, ensuring they’re unlinked correctly using the context manager protocol. Once you’ve created the manager you can call methods to construct either SharedMemory or ShareableList instances. These are always newly created, however, and the assumption seems to be that you’ll pass references to them around rather than creating new ones elsewhere and attaching them to existing names.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import multiprocessing
from multiprocessing import managers
import time

def work_func(shlist, sleep_time):
    time.sleep(sleep_time)
    shlist[sleep_time] = str(sleep_time)
    print(shlist)

def main():
    with managers.SharedMemoryManager() as smm:
        shlist = smm.ShareableList(["alpha", "beta", "gamma", "delta", "epsilon"])
        processes = [multiprocessing.Process(target=work_func, args=(shlist, i+1))
                     for i in range(3)]
        for proc in processes:
            proc.start()
        for proc in processes:
            proc.join()

if __name__ == "__main__":
    main()

Which produces this output:

ShareableList(['alpha', '1', 'gamma', 'delta', 'epsilon'], name='psm_369e21f9')
ShareableList(['alpha', '1', '2', 'delta', 'epsilon'], name='psm_369e21f9')
ShareableList(['alpha', '1', '2', '3', 'epsilon'], name='psm_369e21f9')

It’s also worth noting this doesn’t suffer from the resource_tracker.py errors above, so I can only assume that might be related to the way that references to the same object are shared, instead of attaching by name. Either way, this method of use seems pretty safe, but I’d be cautious using anything more outlandish until some of these issues can be ironed out.

Other Changes

Some smaller changes which weren’t worthy of their own section.

Reversed dict Iteration
Back in one of the articles on Python 3.6 I covered the new dict implementation which maintains insertion sort order. I also outlined three specific reasons why you might still want to use collections.OrderedDict, one of which was that you can iterate through in reverse order with reversed(). As of Python 3.8, however, this difference has been erased — if you apply reversed() to a dict or a view on a dict then you no longer get a TypeError and it works as you’d expect.
KeyboardInterrupt Termination
When you hit CTRL+C on POSIX systems then a SIGINT signal is raised against the process. Python catches this signal and converts it to a KeyboardInterrupt exception so it can easily be caught. If it propagated to the outermost scope, however, the interpreter would terminate with an exit status of 1 instead of the exit status you get with a signal — typically 128+n where n is the signal number, 2 in the case of SIGINT. That has been changed in Python 3.8, and now the exit status will correctly reflect the termination by SIGINT.

Conclusions

A collection of handy changes, especially assignment expressions which promise to make certain constructions rather more convenient. The f-string enhancement is also convenient for the specific case of diagnostic logging. The two new modules are a little niche, and shared_memory in particular seems like it has some rough edges that need smoothing off so I’ll be tempted to stay away from it for now. Frankly if I’m going to divide up a process I prefer to try and limit the shared information anyway and use more formal IPC such as queues.

So that’s it for this article — as usual in part 2 we’ll look at the standard library changes.


  1. Some might say it’s so pragmatic it hurts. 

  2. Because I’m lazy and I drew it in a vector drawing package, because it was the easiest way to get a nice SVG. I could have turned off snapping and made things a little more accurate by hand, but life’s too short. That’s right: it’s long enough to write a massive series of huge articles about Python releases nobody cares about any more, but it’s too short to spend a few minutes drawing an accurate graph. I never claimed prioritisation was a core competency of mine. 

  3. Personally I’m never a fan of writing code which requires the reader to know the details of operator precedence in that language — it just strikes me as a false economy. Those brackets don’t take long to type and they may just save someone from making a bug — perhaps even the author! Often when I’ve seen people complain about unnecessary brackets, it’s been nothing more than a thinly disguised way of showing off their encyclopedic knowledge of the language in question. This sort of thing is frankly a waste of time — channel your efforts into solving real problems, not memorising tables. 

The next article in the “Python 3 Releases” series is What’s New in Python 3.8 - Library Changes
Wed 6 Jul, 2022