☑ What’s New in Python 3.10 - Library Changes

In this series looking at features introduced by every version of Python 3, we conclude our look at Python 3.10 by looking at updates to the standard library modules.

This is the 23rd of the 28 articles that currently make up the “Python 3 Releases” series.

python 310

Over the last couple of articles I’ve taken a look at the new features and modules in Python 3.10, so in this third and final look at the release it’s time to go through the noteworthy changes to existing modules.

At first glance I thought this would be a fairly quick article, as there didn’t seem to be too many major changes to the library. However, once I dug a little deeper I found some interesting changes in statistics, os, ssl, typing and dataclasses, among others. So let’s jump in and see what little nuggets await us.

Data Types

We start off with some small but useful improvements to the pprint module.

pprint

There are a couple of small improvements to the handle pprint module, for pretty-printing data structures. The first is a new underscore_numbers option which, if True, will add underscores as thousands separators to integers (only).

>>> import pprint
>>> x = {"thousand": 1000, "million": 1000000, "billion": 1000000000}
>>> pprint.pprint(x, underscore_numbers=True)
{'billion': 1_000_000_000, 'million': 1_000_000, 'thousand': 1_000}

The second enhancement is classes constructed using dataclassses.dataclass are now better supported. In the console session below you can contrast the pprint.pprint() output with the output from repr().

>>> import dataclasses
>>> import pprint
>>>
>>> @dataclasses.dataclass
... class MyStructure:
...     one: int
...     two: dict[int, str] = dataclasses.field(default_factory=dict)
...     three: list[int] = dataclasses.field(default_factory=list)
...     four: list[dict[str, str]] = dataclasses.field(default_factory=list)
...
>>> x = MyStructure(one=123, two={1: "one"})
>>> x.three.extend((11, 22, 33))
>>> x.four.append({"one": "un", "two": "deux"})
>>> x.four.append({"one": "eins", "two": "zwei"})
>>>
>>> repr(x)
"MyStructure(one=123, two={1: 'one'}, three=[11, 22, 33], four=[{'one': 'un', 'two': 'deux'}, {'one': 'eins', 'two': 'zwei'}])"
>>> pprint.pprint(x)
MyStructure(one=123,
            two={1: 'one'},
            three=[11, 22, 33],
            four=[{'one': 'un', 'two': 'deux'}, {'one': 'eins', 'two': 'zwei'}])

Numeric and Mathematical Modules

For those fond of statistical analysis, there are three new functions in statistics.

statistics

There are three new functions in the statistics module which all relate to correlating two data sets with each other.

covariance()
As the name implies, calculates the covariance, which is a measure of the joint variability of two random variables. Broadly this indicates whether there’s a positive or negative correlation between the data sets. The magnitude of the result is harder to interpret, and not being a statistics expert not I’m not even going to try — the Wikipedia page has a lot of detail and a ton of formulae, so do please feel very free to read it in great detail if you wish. Don’t worry, I’ll wait.
correlation()
This calculates Pearson’s correlation coefficient between two sets of data. This indicates how strongly the two sets are linearly correlated, with 1 indicating a perfect positive correlation and -1 indicating a perfect negative correlation. The closer to zero the value is, the weaker the correlation. This is a commonly used measure of correlation, but it’s important to remember that it only looks for linear relationships. I also take a bit of issue with the non-specific function name, since there are other perfectly valid correlation measures that might feasibly be added in the future (for example, Spearman’s Rank Correlation).
linear_regression()
Calculates and returns the slope and intercept point of a linear regression between two data sets. In more general terms this is a “line of best fit”, on the assumption that the two data sets are linearly correlated. The return value is a 2-tuple of (slope, intercept) where slope is the gradient of the best fit line and intercept is where the line intercepts the y-axis. This means the formula for the resultant line would be:

\[ y = \mathrm{slope} \cdot x + \mathrm{intercept} \]

In all three cases the two data sets passed must be the same length and be of at least two items, or you’ll get a statistics.StatisticsError exception.

Here’s a quick console session that shows these values for some negatively correlated, with very slight noise, data sets.

>>> import random
>>> import statistics
>>>
>>> x = [i * 3 + random.randint(1, 5) for i in range(100)]
>>> y = [i * 8 + random.randint(1, 10) for i in range(100, 0, -1)]
>>>
>>> statistics.covariance(x, y)
-20122.23717171717
>>> statistics.correlation(x, y)
-0.9997719375238351
>>> statistics.linear_regression(x, y)
LinearRegression(slope=-2.681720204644675, intercept=816.0215142159469)

Functional Programming Modules

A single change to itertools comprise the changes in functional programming in this release.

itertools

A small but useful enhancement which is the addition of itertools.pairwise() for iterating across overlapping pairs of items from an iterable. It takes any iterable and yields a series of 2-tuples of each adjacent pair.

>>> from itertools import pairwise
>>> print("\n".join(repr(i) for i in pairwise("ABCDEFG")))
('A', 'B')
('B', 'C')
('C', 'D')
('D', 'E')
('E', 'F')
('F', 'G')

Broadly this is equivalent to the following implementation, although in reality it’s implemented in C rather more efficiently.

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

File and Directory Access

A series of small enhancements in pathlib, the module for platform-independent handling of paths for files and directories.

pathlib

Some smallish improvements here.

PurePath.parents

A couple of small improvements to the parents attribute of PurePath, which represents a list of the parent directories of the path. As of Python 3.10, this now supports slice notation (as in p.parents[1:3]) and negative offsets (as in p.parents[-2]). You can see both of these demonstrated in the snippet below.

>>> import pathlib
>>> p = pathlib.PurePosixPath("/one/two/three/four/five.ext")
>>> p.parents[1:3]
(PurePosixPath('/one/two/three'), PurePosixPath('/one/two'))
>>> p.parents[-2]
PurePosixPath('/one')
>>> p.parents[-3:]
(PurePosixPath('/one/two'), PurePosixPath('/one'), PurePosixPath('/'))

The method Path.hardlink_to() has been added which creates a hard link. This does the same think that Path.link_to() does except that the sense of the operation is reversed — link_to() makes the specified argument a link to the current path, whereas hardlink_to() makes the current path a link to the specified argument. The new method is consistent with the existing symlink_to() method, and the old link_to() method is now deprecated.

It’s good that the hardlink_to() and symlink_to() methods of Path are now consistent with each other, but do note that both are the opposite way around to os.link() and os.symlink(). Your best option is always to read the documentation quite carefully in any case you’re creating links, as overwriting the target file can often result in extremely annoying data loss.

For consistency with their equivalents in the os module, the stat() and chmod() methods of Path now take a follow_symlinks parameter which can be used to make those operations apply to the symlink itself instead of the target. Where supported, this makes such methods behave just like lstat() and lchmod() respectively.

Note that not all platforms support setting symlink permissions separately to the target file, and you’ll get an error like this if you try: NotImplementedError: chmod: follow_symlinks unavailable on this platform.

Generic Operating System Services

A series of platform-specific improvements in os for users of Linux, MacOS and, rather more esoterically, VxWorks. There’s also a new function in platform for querying Linux version information.

os

A few useful additions to those using specific operating systems in this release.

VxWorks: cpu_count()

First up, for anyone doing embedded development using the VxWorks RTOS, os.cpu_count() now works. Having started my career in embedded development around the turn of the millennium, however, I can tell you that if people are using Python on such systems then that industry is in a very different place than when I started (which, honestly, shouldn’t surprise me).

Linux: eventfd()

For those on the rather more mainstream Linux, there’s a new os.eventfd() function which is exposing the underlying eventfd() which in turn is the libc wrapper around the eventfd2 system call. This is a Linux-specific mechanism provided by the kernel using a special filehandle for a mechanism similar to threading.Event. The filehandle doesn’t represent a real file, but just a single 64-bit counter. An initial value for the counter is provided when the descriptor is created, and then a process can read() from the filehandle and this will block if the value is non-zero. A write() call will increment the counter by the specified value, at which point a waiting read() will return the value and also reset the counter to zero. It’s important to note that if multiple processes are blocked, only one of them will be woken up, not all of them (though see the behaviour of EFD_SEMAPHORE below).

The use of read() and write() are slightly fiddly, since you need to provide a value of exactly the correct number of a bytes in a buffer. As a convenience, therefore, the os module also provides wrappers for them which provide a more convenient interface — these are eventfd_read(), which takes only the file descriptor as a parameter, and eventfd_write(), which takes the descriptor and a value. Be warned they don’t validate the type of file descriptor, however.

The example code below demonstrates the semantics.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import os
import random
import sys
import time

def child_process_main(e_fd):
    my_pid = os.getpid()
    start = time.monotonic()
    num_events = 0
    while True:
        result = os.eventfd_read(e_fd)
        print(f"[{my_pid}] {time.monotonic() - start:05.2f} - Received {result}")
        if result > 99:
            print(f"{my_pid} terminating")
            sys.exit(num_events)
        num_events += 1
        time.sleep(0.6)

e_fd = os.eventfd(0)

for i in range(3):
    if (pid := os.fork()) == 0:
        child_process_main(e_fd)
    else:
        print(f"Spawned child {pid}")

for i in range(10):
    time.sleep(0.5)
    value = random.randint(1, 99)
    print(f"Setting {value}...")
    os.eventfd_write(e_fd, value)

time.sleep(0.5)
for i in range(3):
    os.eventfd_write(e_fd, 100)
    pid, status = os.wait()
    print(f"{pid} exited with {os.waitstatus_to_exitcode(status)}")

For those less familiar with Unix process management, I’ll run through the behaviour briefly. The code forks three child processes, all of which go into a loop where they wait for the eventfd to be non-zero. The parent process then sends random values from 1 to 99, each time waking up an arbitrary child which is waiting, then pausing for 0.5 seconds before repeating. The children wait for 0.6 seconds after receiving an event to ensure the same child process shouldn’t process two consecutive events.

After sending 10 such values, the parent process sends the the value 100 which acts as a termination signal. It does this three times, once for each child. The child processes also set the number of events they processed as their exit status, which the parent process recovers.

You can see some sample output of running this code below.

Spawned child 26901
Spawned child 26902
Spawned child 26903
Setting 62...
[26902] 00.50 - Received 62
Setting 2...
[26901] 01.00 - Received 2
Setting 96...
[26902] 01.50 - Received 96
Setting 8...
[26901] 02.00 - Received 8
Setting 20...
[26903] 02.50 - Received 20
Setting 29...
[26901] 03.01 - Received 29
Setting 4...
[26902] 03.51 - Received 4
Setting 4...
[26901] 04.01 - Received 4
Setting 80...
[26903] 04.51 - Received 80
Setting 99...
[26901] 05.01 - Received 99
[26902] 05.51 - Received 100
26902 terminating
26902 exited with 3
[26903] 05.54 - Received 100
26903 terminating
26903 exited with 2
[26901] 05.61 - Received 100
26901 terminating
26901 exited with 5

The behaviour of the eventfd can also be modified by use of several flags, which can be bitwise OR’d together and passed as a second argument to eventfd().

EFD_NONBLOCK
Makes the read() and write() calls non-blocking, raising exception BlockingIOError if they would have blocked.
EFD_SEMAPHORE
Modifies the behaviour of read() if the counter is non-zero, so that instead of resetting it straight to zero, it instead just decrements the counter by 1.
EFD_CLOEXEC
Sets the close-on-exec flag, which closes the descriptor if exec() or similar function is called to replace the currently running executable.

If you know you’ll be using Linux, this is quite a convenient mechanism to synchronise actions across processes, especially if the other processes are not Python. However, if you’re dealing with Python only then you might find that the multiprocessing module provides more convenient facilities for this.

If you do use this mechanism, one final note of caution — since these are raw file descriptors, of the same sort returned by os.read() and the like, there is no automatic closing behaviour, you’ll need to look after this yourself. The session below demonstrates that if you repeatedly call os.eventfd() without closing the resultant descriptors, they get leaked and you have no convenient way to close them. If you go too far you’ll hit the OS limit and see the dreaded OSError: [Errno 24] Too many open files error.

>>> import os
>>> os.listdir(f"/proc/{os.getpid()}/fd")
['0', '1', '2', '3']
>>> for i in range(5):
...     e_fd = os.eventfd(0)
...
>>> os.listdir(f"/proc/{os.getpid()}/fd")
['0', '1', '2', '3', '4', '5', '6', '7', '8']

Linux: os.splice()

Another Linux-specific enhancement in this release is the addition of os.splice(). Once again, this is directly exposing the splice() system call, which is used to copy data between two file descriptors without the overhead of moving it through userspace. The important limitation to note, however, is that at least one of the file descriptors must be a pipe.

You may be familiar with os.sendfile(), a wrapper around the underlying sendfile() call, which is a similar zero-copy mechanism to move data between file descriptors2. On Linux, I believe these days sendfile() is actually implemented on top of the simpler splice().

To understand how this is useful, it may be helpful to think of a pipe as less of a channel and more of a simply a shared buffer. Instead of copying from a pipe (in the kernel) to a buffer in user memory, and then copying back into the kernel to a different file descriptor, you can have the data transferred directly within the kernel.

If, for example, you’ve implemented a reverse-proxying web server in Python and you’ve opened a pipe to receive data from another process, splice() forms a more efficient way to move data to the socket to the HTTP client than performing read() and write() operations through userspace. In combination with sendfile() you could implement a fairly fully-functional HTTP server, capable of both static content and reverse proxying, without any userspace copying at all.

Such a HTTP server is a little complicated to use as an example in this article, so here’s a rather pointless example which illustrates splice() in action, copying 5 bytes of data from a pipe to a temporary output file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import os
import socket
import tempfile

# Create a pipe and feed some data in.
out_fd, in_fd = os.pipe()
os.write(in_fd, b"Hello, world")

# Open a temp output file, copy data to it then read it.
with tempfile.TemporaryFile() as tmp_fd:
    os.splice(out_fd, tmp_fd.fileno(), 5)
    tmp_fd.seek(0)
    print(f"Got data: {repr(tmp_fd.read())}")

os.close(out_fd)
os.close(in_fd)

In the example above you can see splice() taking the three mandatory parameters: the input and output file descriptors, and the byte count to copy. There are two more optional parameters, offset_src and offset_dst, which specify an offset within the source and destination files from which to copy the data. The nuances here are:

  • The offset for whichever end of the copy is the pipe must be None.
  • Specifying an offset of None means that the current file read/write pointer position will be used.
  • If the current file/read write pointer is used, it will also be updated to reflect the number of bytes read/written.
  • Specifying a non-None value for a file descriptor will start at that offset instead, and also means that the current file position will remain where it is.

The fact that the file write position was updated is the reason why seek(0) was required in line 12 of the code above to return the pointer to the start of the file so it can be read back.

As a final aside, there’s no corresponding appearance of the tee() function for duplicating pipe content, which is often used in conjunction with splice(). Perhaps it might appear in a future Python version, or perhaps it’s esoteric enough that they won’t bother.

MacOS: New Open Flags

Finally, for MacOS users, some platform-specific flags for os.open() have been made available. These are listed below with their purpose.

O_EVTONLY
Indicates that the file is only being opened to receive filesystem events, for example via kqueue.
O_FSYNC
This is an alias for O_SYNC, which guarantees that the written data has actually hit disk by the time write() returns (whereas normally it might be residing in a cache somewhere, but not actually written yet).
O_SYMLINK
If the path specified is a symlink then this flag opens the symlink itself rather than its target.
O_NOFOLLOW_ANY
This flag disallows following of any symlink — if any component of the specified path is a symlink and this flag is included, the open() operation will fail.

platform

There’s a new platform.freedesktop_os_release() function to parse the /etc/os-release (or sometimes /usr/lib/os-release) file and return the results as a dict. I think this is fairly Linux-specific, and might even differ between distributions, but the major ones seem to offer it.

On platforms where it’s not available you’ll get a FileNotFoundError, but where available you’ll get something like the result shown below.

>>> import platform
>>> import pprint
>>> pprint.pprint(platform.freedesktop_os_release())
{'BUG_REPORT_URL': 'http://www.raspbian.org/RaspbianBugs',
 'HOME_URL': 'http://www.raspbian.org/',
 'ID': 'raspbian',
 'ID_LIKE': 'debian',
 'NAME': 'Raspbian GNU/Linux',
 'PRETTY_NAME': 'Raspbian GNU/Linux 11 (bullseye)',
 'SUPPORT_URL': 'http://www.raspbian.org/RaspbianForums',
 'VERSION': '11 (bullseye)',
 'VERSION_CODENAME': 'bullseye',
 'VERSION_ID': '11'}

Networking and Interprocess Communication

A few changes in socket, most of which are a perhaps a bit niche, and also in ssl, which are probably of more general interest.

socket

A handful of small changes in socket in this release, the first of which is that the socket.timeout exception is now just an alias for TimeoutError.

>>> import socket
>>> socket.timeout
<class 'TimeoutError'>

There’s also a new IPPROTO_MPTCP constant to create Multipath TCP sockets. This is an experimental modification of TCP to allow it to use multiple different paths concurrently for increased throughput and redundancy. It’s of interest for wireless networks, where a device may be in contact with multiple base stations and be able to distribute a data stream across them, and also potentially useful in datacenters as an alternative to other forms of link aggregation (e.g. IEEE 802.3ad). The big advantage of it is that it presents the same interface as TCP above the sockets layer, so applications don’t need any changes to take advantage of it.

Finally, another constant IP_RECVTOS was also added, which is a socket option to receive an ancillary message with each incoming packet which contains the ToS field from the IP packet. As with receiving any ancillary messages, you’ll need to use the recvmsg() option on the socket instead of recv() and make sure to pass a positive value for the ancbufsize parameter since it defaults to zero.

Note that as far as I know, ancillary messages only apply to UDP not TCP — since TCP is a stream, it doesn’t make any logical sense to recover anything from the IP header since multiple IP packets will comprise the stream. This option is also platform-dependent — I know Linux supports it, but MacOS doesn’t seem to and I’ve no idea about Windows. The constant seems to be defined on all platforms, it’s just a question of whether you get OSError when you call setsockopt() with it.

Here’s a brief snippet showing IP_RECVTOS in action, receiving a ToS byte of \x00.

>>> import socket
>>> sock = socket.socket(socket.AF_INET, type=socket.SOCK_DGRAM)
>>> sock.bind(("", 3333))
>>> sock.setsockopt(socket.IPPROTO_IP, socket.IP_RECVTOS, 1)
>>> sock.recvmsg(1024, 1024)
(b'Hello', [(0, 1, b'\x00')], 0, ('127.0.0.1', 39499))
>>> socket.SOL_IP, socket.IP_TOS
(0, 1)

ssl

A few changes in ssl to support later versions of OpenSSL and improve security by updating default settings and deprecating older protocols.

OpenSSL Dependence

The ssl module now requires OpenSSL 1.1.1 or newer — i.e. dropping support for versions 1.0.2 and 1.1.0, and LibreSSL. Given the ubiquity of OpenSSL, and the fact that 1.1.1 was released in September 2018, it should be a pretty safe bet. The change reduces the effort required to maintain the module and offers a better experience to its users, with fewer optional features that might be missing on other platforms. PEP 644 has a more detailed discussion of the motivations.

In addition, Python 3.10 contains early support for OpenSSL 3.0.0 which is a major new release, including reintroduced FIPS 140 support and Linux Kernel TLS. Some features may start to be only available in 3.0.0, but as of 3.10 this is only the new option OP_IGNORE_UNEXPECTED_EOF. You can check which OpenSSL version was linked with ssl.OPENSSL_VERSION for a human-readable string, or ssl.OPENSSL_VERSION_INFO for a numeric tuple.

>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.1.1n  15 Mar 2022'
>>> ssl.OPENSSL_VERSION_INFO
(1, 1, 1, 14, 15)

This is all good stuff, as OpenSSL development has increased in pace over the last few years and it’s important to make it easy to stay on supported versions. The loss of support for LibreSSL is unfortunate, as this project was originally started to address issues in the OpenSSL codebase, but as the projects diverge it’s understandable that supporting both would become less feasible. Also, with OpenSSL evolving more rapidly now then it’s starting to become more debatable whether LibreSSL has a strong reason to exist any more.

Secure Defaults

The default settings have been tightened up for better security:

Deprecations

Use of deprecated functions and constants now results in a DeprecationWarning, and this release also adds a number of features to the deprecation list. Notable instances include ssl.wrap_socket(), which is replaced by a method of the same name on SSLContext, and ssl.match_hostname()

Protocols SSL 3.0, TLS 1.0 and TLS 1.1 are no longer officially supported, although programmers are not explicitly blocked from using them. Some platforms may disallow them, however, and using them is just generally really bad idea if you have any choice at all.

get_server_certificate() Timeout

Finally, the ssl.get_server_certificate() function has had a timeout parameter added, which specifies a timeout in seconds. As regular readers will know, I always love a timeout — every time you implement a blocking function that doesn’t have a timeout, a fairy dies3.

>>> ssl.get_server_certificate(("127.0.0.1", 443), timeout=5.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/andy/.pyenv/versions/3.10.8/lib/python3.10/ssl.py", line 1524, in get_server_certificate
    with context.wrap_socket(sock, server_hostname=host) as sslsock:
  File "/home/andy/.pyenv/versions/3.10.8/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/home/andy/.pyenv/versions/3.10.8/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/home/andy/.pyenv/versions/3.10.8/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: _ssl.c:980: The handshake operation timed out

Internet Protocols and Support

No real new features here, but some changes in urllib.parse to address some potential security exploits.

urllib.parse

The first change here is to disallow the use of ; as a separator for query parameters in the parse_qs() and parse_qsl() functions in the urllib.parse module. As per the HTML 4.01 spec in 1999, servers were advised to support it as an alternative to the more commonly used & character, to save the trouble of frequently having to escape genuine & characters in parameter values.

However, this alternative never really caught on in a big way, and the HTML 5 spec from 2014 makes no mention of it — & is the only option referenced. Moreover, there are serious vulnerabilities that result from mismatches in what constitutes a query parameter between caching proxies and servers — this was listed as CVE-2021-23336 in NIST and is explained in detail in this article. So in short, it has very limited real-world usage and it causes serious security problems — the simple solution is to simply drop it.

>>> urllib.parse.parse_qs("one=un&two=deux")
{'one': ['un'], 'two': ['deux']}
>>> urllib.parse.parse_qs("one=un;two=deux")
{'one': ['un;two=deux']}

The second change, in a similar vein, is to disallow newlines (\n and \r) and tabs (\t) from URLs — these will now be stripped by the urllib.parse parser. This is in response to the WHATWG‘s guidlines, and apparently this too can lead to attacks, but I don’t have any references for that offhand. In any case, it doesn’t feel like newlines should ever have any place in a real world URL, so I can’t imagine this change upsetting many people.

If you need to know the set of characters regarded as unsafe for any reason, you can use urllib.parse._UNSAFE_URL_BYTES_TO_REMOVE. Since this is just a list, if you really wanted to you could add your own characters to it as well, at least in this release — I wouldn’t bank on it not changing in the future, however, so writing code that modifies it is quite possibly a recipe for future pain4.

>>> urllib.parse._UNSAFE_URL_BYTES_TO_REMOVE
['\t', '\r', '\n']
>>> urllib.parse.urlparse("http://www\r\n.andy-pearce\t.com/bl\nog/")
ParseResult(scheme='http', netloc='www.andy-pearce.com', path='/blog/', params='', query='', fragment='')
>>>
>>> urllib.parse._UNSAFE_URL_BYTES_TO_REMOVE.append("a")
>>> urllib.parse.urlparse("http://www.andy-pearce.com/blog/")
ParseResult(scheme='http', netloc='www.ndy-perce.com', path='/blog/', params='', query='', fragment='')

Development Tools

Some more changes to typing, which are worth a glance for anyone.

typing

As well as the more significant type hinting features described in the previous article, there are some smaller enhancements within the typing module that are worth noting.

Improvements to Literal

First up are some changes to typing.Literal to conform more closely to PEP 586. This was added in Python 3.8, and I briefly discussed it in a previous article, but to recap the purpose of it is to type hint cases where one or more specific values are valid rather than an entire type. The specific changes in 3.10 are:

  • The list of possible literal values is automatically de-dulicated.
  • Literal objects now ignore parameter ordering when comparing for equality.
  • Equality comparisons also respect types, so literals of different types will not compare equal even if the specific values themselves happen to compare equal.
  • If any parameter is not hashable, equality comparisons will now raise TypeError.

These are all demonstrated in the snippet below:

>>> from typing import Literal
>>> Literal[1, 2, 3, 2, 1]
typing.Literal[1, 2, 3]
>>> Literal[1, 2, 3] == Literal[3, 2, 1]
True
>>> Literal[0] == Literal[False]
False
>>> Literal[frozenset()] == Literal[set()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/andy/.pyenv/versions/3.10.0/lib/python3.10/typing.py", line 1258, in __eq__
    return set(_value_and_type_iter(self.__args__)) == set(_value_and_type_iter(other.__args__))
TypeError: unhashable type: 'set'

Check for TypedDict

The typing.TypedDict construct is a special type to allow the value type of specific keys of a dict to be specified. At runtime it behaves just like a standard dict, however, and there’s no way to introspect its special status. Until now, that is — there’s a new function is_typeddict() which returns True if the parameter type is a TypedDict, and False otherwise.

typing.Protocol Changes

When using typing.Protocol to allow structural sub-typing (i.e. static duck-typing), the @typing.runtime_checkable decorator is required if you want the Protocol subclass to be checkable with isinstance() and issubclass(). However, using these functions on a Protocol not so decorated previously passed without error — as of this release, attempting such a check will raise TypeError. If this suddenly causes exceptions in your code, remember that it’s only enforcing something that you should have been doing all along.

Python Runtime Services

Both contextlib and dataclasses have some useful changes, and inspect has a better way of querying annotations at runtime which is a little more obscure.

contextlib

A few changes in the contextlib module which provides utilities for use with with. First up is a new contextlib.aclosing() context manager, which is similar to contextlib.closing() but calls aclose() instead of close(). This is important in some cases to avoid leaks, as discussed in bpo-41229. That said, I had trouble reproducing the issue myself, so I suspect that something else is doing this cleanup sooner anyway in my (latest) release of Python 3.10.

Secondly, the contextlib.nullcontext manager, which is a no-op context manager for cases where providing one is optional, can now be used as an async context manager as well.

Finally, there’s a new contextlib.AsyncContextDecorator, which allows any async context manager to be used as a decorator. This is the async equivalent of contextlib.ContextDecorator.

dataclasses

There are a couple of enhancements in the dataclasses module.

Slots Support

Support for __slots__ has been added to the @dataclasses.dataclass decorator. If you pass slots=True to the decorator, instead of returning the same class with some additional methods, an entirely new class is returned which defines its attributes in __slots__1.

Here you can see an example without the change, where setting a new attribute works as you’d expect.

>>> @dataclasses.dataclass
... class MyNormalClass:
...     foo: int = 9
...     bar: float = 10.5
...
>>> x = MyNormalClass()
>>> x.foo = 111
>>> repr(x)
'MyNormalClass(foo=111, bar=10.5)'
>>> x.another = "Hello"
>>> x.another
'Hello'

But below is an example with slots=True, and you can see that setting a new attribute is disallowed, as is expected with __slots__.

>>>
>>> @dataclasses.dataclass(slots=True)
... class MySlottedClass:
...     foo: int = 99
...     bar: float = 105.5
...
>>> y = MySlottedClass()
>>> y.__slots__
('foo', 'bar')
>>> y.foo = 123
>>> repr(y)
'MySlottedClass(foo=123, bar=105.5)'
>>> y.another = "World"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MySlottedClass' object has no attribute 'another'

Keyword-Only Fields

It’s now possible to specify that arguments to the __init__() method added by @dataclass must be specified by keyword and not positionally. There are multiple ways to specify this when declaring the class, however.

Firstly, you can just pass kw_only=True to @dataclass to require that all arguments must be specified by keyword. Alternatively, if you use the dataclasses.field initialiser then you can specify that an individual argument must be specified by keyword.

>>> import dataclasses
>>>
>>> @dataclasses.dataclass
... class MyClass:
...     normal_arg: int
...     kw_only_arg: int = dataclasses.field(kw_only=True)
...
>>> x = MyClass(123, 456)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: MyClass.__init__() takes 2 positional arguments but 3 were given
>>> x = MyClass(123, kw_only_arg=456)
>>> x
MyClass(normal_arg=123, kw_only_arg=456)

Finally, there’s a rather quirky but convenient approach using a KW_ONLY marker between positional and keyword-only fields, as in the example below.

>>> @dataclasses.dataclass
... class MyClass:
...     one: int
...     two: int
...     _: dataclasses.KW_ONLY
...     three: int
...     four: int
...
>>> x = MyClass(1, 2, three=3, four=4)
>>> x
MyClass(one=1, two=2, three=3, four=4)

inspect

A new get_annotations() function provides a safe way to query the annotations defined on an object. This provides a few conveniences over directly accessing __annotations__, including:

  • Gracefully coping with objects that don’t provide __annotations__ (e.g. those created by functools.partial())
  • Ignoring annotations inherited from parent classes if a class doesn’t provide its own.
  • Providing you pass eval_str=True to get_annotations(), converting string annotations to the actual Python type using eval.
  • Always returns a newly created dict, not a reference to something else you might accidentally change.

This is now the best practice for accessing annotations, and as a result the inspect functions and methods signature(), Signature.from_callable() and inspect.Signature.from_function() all use get_annotations() to do their work.

Smaller Changes

A few minor updates that fit in a nutshell.

Base32 Encoding
The base64 module now offers methods b32hexencode() and b32hexdecode() to support Base32 encoding with the extended hex alphabet as per RFC 4648.
Bisect Key
The bisect module functions can now take a key parameter, with the same meaning as with sorted().
Curses Extended Colours
The venerable ncurses library is still receiving updates, the latest version being 6.3. Python 3.10 has added support for a feature added back in 6.1, however, released back in January 2018. That feature is support for extended colour functions to support 256-colour terminals. The good news is that Python’s curses library will transparently use these behind the scenes if appropriate, so no code changes are required.
Glob Search Root Directory
The glob() and iglob() functions in the glob module have always searched from the current working directory. As of Python 3.10, they both accept a root_dir parameter to specify an alternative root directory to start the search, or a dir_fd parameter to do the same with a directory descriptor.
Strict Path Checking
If you pass strict=True to os.path.realpath(), then OSError will be raised if the path doesn’t exist or is inaccessible due to a symlink loop.
Original argv
The new sys.orig_argv attribute now reprepsents the original command-line arguments passed to the Python executable, as opposed to the processed version that’s exposed to applications via sys.argv. An example of something that’s omitted from sys.argv, but included in this new attribute, is the command string passed to the -c option.
$ python -c "from pprint import pprint; import sys; pprint(sys.argv); pprint(sys.orig_argv)" one two three
['-c', 'one', 'two', 'three']
['/home/pi/.pyenv/versions/python-3.10/bin/python',
 '-c',
 'from pprint import pprint; import sys; pprint(sys.argv); '
 'pprint(sys.orig_argv)',
 'one',
 'two',
 'three']
Standard Library Modules Names
Another new attribute in sys, stdlib_module_names is a frozenset of all the modules which are part of the standard library, just in case your code needs to treat them differently from other modules for some reason. I would personally recommend not needing to write such code as the better option, but life is not always so kind to us.
Hook Retrieval in threading
There are two new functions in threading, gettrace() and getprofile(), to return the functions set by settrace() and setprofile(). There’s also a new threading.__excepthook__ attribute to store the original value of threading.excepthook() in case it gets replaced by something broken.

Conclusion

And that’s the lot for Python 3.10! Once again some useful improvements here, but it does feel a little like the pace of improvements has slowed quite a lot since the early days of the 3.x line. That’s not a bad thing, mind you, it’s important that an established language offers some stability. But it’s heartening to see there’s still plenty going on, and hopefully plenty still to look forward to in future releases.

In my next article in the series, I’ll be looking at the newly-released 3.11, which will be the first of this series which is actually not totally out of date before I start writing it!


  1. If you’re not familiar with the __slots__ mechanism, it’s a way of declaring a fixed list of parameters rather than storing them in a dynamic dict for each instance. This is generally done for performance in cases where a large number of instances will be created — the savings in memory and attribute lookup speed can be significant. You can read more about it in the Python documentation

  2. The sendfile() call is not Linux-specific, but on some platforms the output file descriptor for sendfile() must be a socket. This used to be the case on Linux, but the restriction was lifted in version 2.6.33 of the kernel. 

  3. Well, alright, if you want to get techincal about it it’s got a lot less to do with fairies and a lot more to do with programmers out there grumbling about you as they’re forced to implement their own timeout mechanism on top of your inconsiderately blocking function, but I allow myself a little poetic license. 

  4. But I gather some people are into that, so who am I to judge. 

Next article in the “Python 3 Releases” series: What’s New in Python 3.11 - Performance Improvements
Fri 16 Dec, 2022