☑ What’s New in Python 3.11 - Improved Modules II

22 Jan 2023 at 1:30PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we continue our tour of changes in Python 3.11, covering some more standard library changes in numerical modules, path manipulation, SQLite support and more.

This is the 28th of the 28 articles that currently make up the “Python 3 Releases” series.

python 311

Following on from the previous article, this is probably my penultimate look at the new Python 3.11 release. In it we’ll run through more changes to the standard library. There’s a fair bit to get through here, so buckle up and let’s jump in!1

Numeric and Mathematical

There are a handful of changes in the mathematical modules in this release, with some small changes to fractions and some slightly more substantial changes to math.

fractions

A couple of changes to fractions, the first being that underscores are supported in numbers, as per PEP 515, when initialising Fraction from a string.

>>> import fractions
>>> fractions.Fraction("3_200/100_000")
Fraction(4, 125)

The second change is that Fraction now supports an __int__() method. This doesn’t change too much, since it already offered __trunc__() which meant you could still pass one to int() and get a truncated integer value, since int() falls back on __trunc__() if no __int__() method is found. Though whilst we’re on the subject that delegation is deprecated in Python 3.11, so will presumably stop working anyway in a few releases.

It’s also interesting to note there’s a good reason that __int__() wasn’t added to Fraction originally, because before the advent of the __index__() and __trunc__() methods, the meaning of __int__() was rather overloaded — it meant you could use the wrong type in contexts such as being passed to chr() where things like a Fraction or Decimal really shouldn’t have worked. Due to the deprecation periods, this sort of thing only actually started raising TypeError in Python 3.10.

Anyway, past and future deprecations aside, the main difference in this particular release is that the existence of the __int__() method means that a Fraction instance now passes as a subclass of typing.SupportsInt.

>>> x = fractions.Fraction(20, 3)
>>> int(x)
6
>>> x.__int__()
6
>>> import typing
>>> isinstance(x, typing.SupportsInt)
True

math

The math module now exposes a couple of new functions from libm:

  • math.exp2(n) returns $ 2^{n} $.
  • math.cbrt(n) returns $ \sqrt[3]{n} $.

There are also a couple of small changes for better IEEE 754 compliance.

exp2()

I think this is really just a consistency change, exposing the exp2() function in the underlying libm library, because Python already has 2**x and math.pow(2, x). However, when you write an implementation for a fixed exponent, it can be possible to be more accurate for a wider range of bases. I’m not qualified to comment on the accuracy of the exp2() implementation, but there are cases where it differs from math.pow() — in the snippet below you can see it differed in around a quarter of a percent of a million randomly selected values.

>>> import math
>>> math.exp2(1.5)
2.8284271247461903
>>> 2 ** 1.5
2.8284271247461903
>>> 
>>> import random
>>> count = 0
>>> num_samples = 1_000_000
>>> for n in range(num_samples):
...     x = random.uniform(-1000.0, 999.0) + random.random()
...     if math.pow(2, x) != math.exp2(x):
...         count += 1
...
>>> count * 100 / num_samples
0.2348

cbrt()

Another libm function being exposed, but there’s more of a case for this than exp2() since math.pow() doesn’t support negative bases with fractional exponents, raising a ValueError. The ** operator works, but as you can see in the snippet below its result differs from cbrt() in around 87% of cases, and it’s reasonable to assume that cbrt() is the more accurate in these cases.

>>> math.cbrt(1.860867)
1.23
>>> math.pow(1.860867, 1/3)
1.23
>>>
>>> count = 0
>>> num_sample = 1_000_000
>>> for n in range(num_samples):
...     x = random.uniform(-1000.0, 999.0) + random.random()
...     if math.cbrt(x) != x ** (1/3):
...         count += 1
...
>>> count * 100 / num_samples
86.5953

IEEE 754 Compliance

A couple of small changes here. The first is that math.nan can always be assumed to be available — previously it may not have been defined if IEEE 754 support wasn’t available. The second change is that math.pow(0.0, -math.inf) and math.pow(-0.0, -math.inf) now return math.inf as the IEEE 754 spec requires — previously they raised ValueError.

In my view, perhaps the most notable change is the direct consequence of these changes: that the Python interpreter build now requires IEEE 754 support and will fail without it. I don’t see this being a major issue for many people, mind you, as I very much doubt Python would be used on any platforms so minimal they don’t support floating point.

File and Directory Services — pathlib

The behaviour of the glob() and rglob() methods of the pathlib.Path object and its subclasses did not quite behave correctly prior to Python 3.11 when the pattern supplied ended with a directory separator. These functions are supposed to only return matches which are themselves directories in this case, but it was being ignored.

This has been corrected in Python 3.11, as demonstrated in the snippet below.

>>> import pathlib
>>> list(pathlib.Path("/etc").glob("sudo*"))
[PosixPath('/etc/sudoers.d'), PosixPath('/etc/sudoers'), PosixPath('/etc/sudo_lecture')]
>>> list(pathlib.Path("/etc").glob("sudo*/"))
[PosixPath('/etc/sudoers.d')]

Data Persistence — sqlite3

A number of changes in sqlite3 in this release. I don’t quite know why the sudden burst of activity on this library in this release, perhaps the core maintainer of this module finally had some free time — but whatever the reason, there are some welcome changes here!

Due to the number of changes this section is a little long, so if you’re not remotely interested in SQLite, skip to the Data Compression and Archiving section.

There are a few changes that I think it’s worth drilling into here. The ones where I’ve gone into a little detail and given their own subsections are:

  • Altering runtime limits
  • Dealing with exceptions in user callback functions
  • Serialisation of in-memory databases to bytes
  • Incremental access to BLOB fields

There’s also a final section which groups together a set of changes which I felt only warranted a brief mention.

Altering Runtime Limits

The module now offers the setlimit() and getlimit() functions to access SQLite’s runtime limits API. This allows various resource limitations to be set on a per-connection basis, with the following table showing the limits available.

SQLITE_LIMIT_LENGTH
Maximum permitted size of any string, blob or row, in bytes.
SQLITE_LIMIT_SQL_LENGTH
Maximum permitted length of an SQL statement, in bytes.
SQLITE_LIMIT_COLUMN
Maximum permitted number of columns in a table, index or view. Also used as the limit on the number of columns in a SELECT result set, the number of terms in GROUP BY and ORDER BY clauses, and the number of terms in an INSERT statement.
SQLITE_LIMIT_EXPR_DEPTH
SQLite parses expressions into a tree structure, and this limit sets the maximum permitted depth of this tree.
SQLITE_LIMIT_COMPOUND_SELECT
Limits the number of individual SELECT statements in a compound SELECT2.
SQLITE_LIMIT_VDBE_OP
The maximum number of instructions in a virtual machine program used to implement an SQL statement. If you want to know more details about the impact of this, you need to read up on the SQLite bytecode engine.
SQLITE_LIMIT_FUNCTION_ARG
The maximum permitted number of arguments that can be passed to any function. SQLite should support much larger numbers of arguments than are allowed by default, but the assumption is that people passing hundreds or thousands of arguments are probably attackers looking for exploits, not legitimate users.
SQLITE_LIMIT_ATTACHED
The maximum number of databases that can be attached to the same connection. Attaching multiple databases is an SQLite feature which allows multiple database files to be treated as a single unified database by the connection.
SQLITE_LIMIT_LIKE_PATTERN_LENGTH
The maximum permitted length of the pattern that can be passed to the LIKE or GLOB operators. This is limited to avoid denial of service attacks which could result from very large patterns (i.e. millions of bytes).
SQLITE_LIMIT_VARIABLE_NUMBER
The maximum permitted index number of a parameter passed along with the query.
SQLITE_LIMIT_TRIGGER_DEPTH
If using recursive triggers, a feature added in 3.6.18 to allow one trigger’s operation to trigger others, this sets the maximum permitted depth of such recursion. Alternatively you could disable them entirely with PRAGMA recursive_triggers.
SQLITE_LIMIT_WORKER_THREADS
The maximum number of additional worker threads that a single prepared statement can initiate.

Each setting has a hard limit, which are set at compile-time, and trying to increase the value above this will silently set it to the hard limit value instead. The return value of setlimit() is whatever the previous value was, whether or not it differs. The non-destructive way to query the current value is to call getlimit().

The snippet below shows us querying a table which doesn’t exist and (as expected) getting an error to that effect. But then we query the limit of the SQL query string, and change it to an absurdly small 10 characters, and then we see the error we get back has changed.

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("SELECT * FROM some_table_that_does_not_exist")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: no such table: some_table_that_does_not_exist
>>> conn.getlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH)
1000000000
>>> conn.setlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH, 10)
1000000000
>>> conn.getlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH)
10
>>> conn.execute("SELECT * FROM some_table_that_does_not_exist")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.DataError: query string is too large

Callback Exceptions

There are a few places within sqlite3 where you can register your own callback functions which are directly invoked from the SQLite code. Some examples of functions which register such callbacks are:

  • create_function() declares a user function which can be called from SQL.
  • create_aggregate() specifies a class with methods to determine how to aggregate multiple values into one in a customised manner.
  • create_collation() is a way of implementing custom sort orderings using a custom function.
  • set_authorizer() registers a callback which is invoked to control access to columns and tables.
  • set_progress_handler() sets a callback to be invoked every N instructions in the virtual machine, so a single thread can still perform background tasks during long-running queries.
  • set_trace_callback() invokes the specified function for every SQL statement which is run.

Since these are user-defined Python code they can, of course, raise exceptions. Since these pass through the SQLite code, however, preserving the tracebacks doesn’t come for free. As a result, you can decide whether you want them by calling sqlite3.enable_callback_tracebacks(). In the snippet below you can see the default behaviour where such exceptions are silently swallowed:

>>> import sqlite3
>>> def naughty():
...     raise ValueError("XXX")
...
>>> def callback(stmt):
...     print(f"Running: {stmt}")
...     naughty()
...
>>> conn = sqlite3.connect(":memory:")
>>> conn.set_trace_callback(callback)
>>> cursor = conn.execute("SELECT 1")
Running: SELECT 1

If you enable tracebacks, you don’t get to catch the exception, but they are printed, as demonstrated in the continuing snippet below.

>>> sqlite3.enable_callback_tracebacks(True)
>>> try:
...     cursor = conn.execute("SELECT 1")
... except Exception as exc:
...     print(f"Caught exception: {exc}")
...
Running: SELECT 1
Exception ignored in: <function callback at 0x10d6d72e0>
Traceback (most recent call last):
  File "<stdin>", line 3, in callback
  File "<stdin>", line 2, in naughty
ValueError: XXX

You can see that the exception wasn’t caught, but it was printed.

So far so Python 3.10 — the difference in Python 3.11 is simply that the instead of calling PyErr_Print() to display the exception, it instead calls PyErr_WriteUnraisable() to trigger an “unraisable exception”. By default this is much the same, but means that you can register another callback using sys.unraisablehook which gives you some attempt to deal with it. The snippet below shows how you can do this (with minor edits for formatting).

>>> import sys
>>> def exception_hook(exc):
...     print(f"Got exception: {exc}")
...
>>> sys.unraisablehook = exception_hook
>>> cursor = conn.execute("SELECT 1")
Running: SELECT 1
Got exception: UnraisableHookArgs(
    exc_type=<class 'ValueError'>,
    exc_value=ValueError('XXX'),
    exc_traceback=<traceback object at 0x10d72cf00>,
    err_msg=None,
    object=<function callback at 0x10d6d72e0>)

Serialisation

Support for the SQLite serialisation API has been added in the form of two new methods on sqlite.Connection object called, appropriately enough, serialize() and deserialize().

Calling serialize() will produce a bytes object which represents the content of the database. For the common case of a database backed by a file on disk, the serialised version will simply be a copy of the file. However, for an in-memory database it will be the byte stream that would written to disk if the database were saved.

The deserialize() method performs the inverse, converting a serialised byte stream into an in-memory database. Whatever the connection is currently attached to will be disconnected and, in the case of an in-memory database, dropped.

This is demonstrated in the simple example below.

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> cur = conn.cursor()
>>> cur.execute("BEGIN")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("CREATE TABLE foo (one INTEGER PRIMARY KEY, two TEXT)")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("INSERT INTO foo (one, two) VALUES (1, 'aaa')")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("INSERT INTO foo (one, two) VALUES (2, 'bbb')")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("COMMIT")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.fetchall()
[(1, 'aaa'), (2, 'bbb')]
>>>
>>> data = conn.serialize()
>>> type(data)
<class 'bytes'>
>>> len(data)
8192
>>> cur.execute("DELETE FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.fetchall()
[]
>>> conn.deserialize(data)
>>> cur = conn.cursor()
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f03640>
>>> cur.fetchall()
[(1, 'aaa'), (2, 'bbb')]

You can see that an in-memory database is created with a simple table foo, with two records in it. This is serialised into the data variable, and then the entries in the in-memory version are wiped. Upon the call to deserialize() the data is replaced in memory as it was, and the records are back.

Incremental Blobs

Over the years, databases have increasingly gone from being purely oriented around structured records of fundamental types, such as integers and small strings, to being more general purpose data stores. Examples of this include content management systems or wikis, which are really document-oriented storage but are often built on top of relational backend databases, sometimes because they date from before NoSQL databases enjoyed mainstream popularity.

The BLOB type in SQLite works well enough for storing data of reasonable size, with the usual limit being a billion bytes (~953 MB). At these sizes, however, the API’s approach of loading entire values into memory in order to use them becomes cumbersome — memory footprints of hundreds of MB are, at best, inefficient and may be impractical.

Fortunately the SQLite API offers a file-like view of these values, and as of Python 3.11 this API is now exposed as part of the sqlite3 module.

The blobopen() method on the sqlite3.Connection opens a BLOB value like a file. A table, column and row are specified as arguments, and an sqlite3.Blob object is returned, which is a file-like object providing the usual read(), write(), seek() and tell() methods. Also, just as with normal file descriptors, the Blob object can be treated like a context so it’s automatically closed.

Take a look at the code below to illustrate this.

>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("CREATE TABLE foo (id INTEGER PRIMARY KEY, data BLOB)")
<sqlite3.Cursor object at 0x10f3436c0>
>>> conn.execute("INSERT INTO foo (id, data) VALUES (123, zeroblob(64))")
<sqlite3.Cursor object at 0x10f343640>
>>> conn.execute("INSERT INTO foo (id, data) VALUES (456, zeroblob(64))")
<sqlite3.Cursor object at 0x10f3436c0>
>>> with conn.blobopen("foo", "data", 123) as blob_fd:
...     blob_fd.write(b"Hello")
...     blob_fd.write(b", ")
...     blob_fd.write(b"world!")
...
>>> with conn.blobopen("foo", "data", 123, readonly=True) as blob_fd:
...     print(blob_fd.read(16))
...
b'Hello, world!\x00\x00\x00'

Hopefully this is fairly straightforward, but there are a couple of quirks. Firstly, blobopen() can’t change the size of a blob, so the INSERT statements above use the SQLite function zeroblob() to create BLOB values of 64 bytes filled with null bytes.

Secondly, you’ll notice that the bytes objects are read and written — str and other Unicode-aware objects are not supported, and must be encoded/decoded as needed.

Smaller Changes

There are a number of small changes which don’t need much explanation, which I’ve grouped below.

Unsetting Authoriser
The set_authorizer() function sets a callback that’s used in SQLite’s authorisation callbacks scheme. As of Python 3.11 this can now also accept None to remove the callback.
Unicode Collation Names
Calling create_collation() creates a new collation, mentioned earlier. In this release the names of collations can now also include Unicode characters. Passing invalid characters raises UnicodeEncodeError.
Extended Error Codes
SQLite exceptions are now enriched with the SQLite extended error codes, whose code and name can be recovered using the sqlite_errorcode and sqlite_errorname attributes of the exception.
Checking Thread Safety
There’s a new sqlite3.threadsafety attribute which returns the level of thread safety compiled in to the SQLite module. The value is an integer which corresponds to the defined meanings for the field in the DB API 2.0.
Window Functions
Support for create_window_function() has been added which creates a window function. This is similar to an aggregation function, but instead of collapsing multiple values into fewer rows, this modifies rows based on surrounding rows — for example, to calculate a moving average.

Data Compression and Archiving — zipfile

A few improvements to the zipfile module, which provides an interface to manipulating files in ZIP format.

Creating Directories Within Archives

The most significant improvement in zipfile in this release is the addition of the mkdir() method for creating an empty directory directly within an archive file. There are two ways to call this method:

  • With a string specifying a directory name, and an optional second mode argument which specifies the permissions (defaulting to 0511 or r-x--x--x).
  • Using a zipfile.ZipInfo instance instead of a string, where attributes such as filename and permissions are already stored3.
>>> import zipfile
>>> zip_file = zipfile.ZipFile("/tmp/test.zip", mode="a")
>>> print("\n".join(repr(i) for i in zip_file.infolist()))
<ZipInfo filename='testdir/' filemode='drwxr-xr-x' external_attr=0x10>
<ZipInfo filename='testdir/some_file' filemode='-rw-r--r--' file_size=24>
<ZipInfo filename='testdir/another_file' filemode='-rw-r--r--' file_size=30>
>>> zip_file.mkdir("testdir/subdirectory", mode=0o750)
>>> print("\n".join(repr(i) for i in zip_file.infolist()))
<ZipInfo filename='testdir/' filemode='drwxr-xr-x' external_attr=0x10>
<ZipInfo filename='testdir/some_file' filemode='-rw-r--r--' file_size=24>
<ZipInfo filename='testdir/another_file' filemode='-rw-r--r--' file_size=30>
<ZipInfo filename='testdir/subdirectory/' filemode='drwxr-x---' external_attr=0x10>

Due to the poor state of the documentation, it’s quite hard to create an accurate ZipInfo instance directly — in particular some of the fields are platform-specific. However, do bear in mind there’s a factory method ZipInfo.from_file() which creates a ZipInfo instance representing a file on the filesystem. This is probably the easiest way to factor out all those platform-specific concerns, providing you do have a real file to use as a model.

Smaller Changes

A couple of other, smaller, changes:

Metadata Encoding
Added a metadata_encoding parameter to the ZipFile constructor which specifies the encoding of the metadata within the ZIP file. This is only valid when reading ZIP files — if mode is anything other than "r", the constructor raises a ValueError. Also, there is only a single setting on a ZipFile instance, there’s no way to deal with cases where different members in the same file use different encodings for their filenames. Finally, any flags in the ZIP file header which specify an encoding override this setting.
Better Path Support
The zipfile.Path class represents a path within a specific zip file, including the path of the zip file itself. As of this release, it now offers additional attributes stem, suffix and suffixes, which have the same meanings as for pathlib.PurePath.

Cryptographic Services — hashlib

A handy new helper function for hashing file-like objects and a couple of changes to hashing function implementations.

Hashing Files

Hashing the content of files is not uncommon, but to do so with hashlib is a little tedious because you end up writing that same boilerplate about reading files in fixed-sized chunks4 and passing them into the hash object until they’re done. It’s not difficult, but it’s dull to have to write.

Thankfully there’s now a new helper hashlib.file_digest() to do this for you. It takes an open file object and either the name of a hash algorithm, or a callable which returns a suitable hash object.

The function then proceeds to construct an instance of the hash object, then read file data in fixed chunks and pass them into the hash, which is then returned so the calling code can retrieve the digest in whatever format it wants (typically raw binary or hex).

>>> import hashlib
>>> with open("/usr/share/dict/words", "rb") as in_fd:
...     hash_obj = hashlib.file_digest(in_fd, hashlib.sha1)
...
>>> hash_obj.digest()
b'\xa6.\xdf\x86\x85\x92\x0f}Z\x95\x110 c\x1c\xde\xbd\x18\xa1\x85'
>>> hash_obj.hexdigest()
'a62edf8685920f7d5a95113020631cdebd18a185'

An important note here is that the file must be open in binary mode, or you’ll get a ValueError when you pass it to file_digest(). Aside from the obvious fact that hash functions as byte-oriented so this makes logical sense, it’s also because the code uses the readinto() method5 of the file object to read data into a pre-allocated bytearray object — this method only available in binary mode.

It’s also interesting to note that it uses memoryview to access the content of the bytearray without copying — this is the sort of efficiency it’s easy to forget when rolling your own implementation, demonstrating the value of having this in the standard library.

Prefer libb2 for BLAKE2

BLAKE2 is a hash function, defined in RFC 7693, which is an evolution of one of the entrants for NIST’s competition to find the hash function to use as the SHA-3 standard. It did well, making it to the final 5, but was eventually pipped to the post by the Keccak family of functions.

With its performance improvements over the original BLAKE, which was assessed for the competition, it has some claims to offer equivalent protection to SHA-3 but at better performance. The newer BLAKE3 standard is much faster still, but not yet offered in the Python standard library, so outside the scope of this article.

Since being introduced into hashlib back in Python 3.6, the implementation was one stored in the CPython source rather than linking to an external library. However, the official libb2 library is more optimised and reduces the maintenance burden on Python maintainers, so it makes sense to use it. As a first step in that direction in Python 3.11, the Python build now uses the library where it’s available when Python is being compiled, but still maintains its own implementation as a backup.

As an aside, if you’re compiling Python on a system with libb2 installed, but you don’t want to create a dependency on it, the notes on issue bpo-47095 has some advice for that towards the end.

Use tiny_sha3 When OpenSSL Unavailable

Another change of implementation, this time for SHA-3. This one isn’t going to affect most people, since as of Python 3.10 if OpenSSL is used then version 1.1.1 is required — since this has an optimised implementation, it’s used whenever it’s available.

Therefore, only in cases where OpenSSL support is disabled at compile time, as of Python 3.11 the public domain tiny_sha3 module replaces the previous vendored copy of the Keccak Code Package which was in the source repository.

The tiny-sha3 is slower but the stripped library size is around 70% smaller — this means that the vast majority of users with OpenSSL support get a smaller build, but at the expense of a (presumably) tiny minority suffering a performance drop in SHA-3.

Generic OS Services

A couple of small enhancements to the logging module to improve introspection and use of syslog, and some improvements to time.sleep() to use higher-resolution clocks.

logging

A couple of small changes, the first of which is generally useful, the second of interest mainly just to people who use SysLogHandler.

Level Names Mapping

There are occasions where it’s useful to present a list of the available logging levels — for example, perhaps as part of --help option where the log level can be supplied as a command-line parameter. However, there’s been no way to do this short of hard-coding them prior to Python 3.11 — as of this release, there’s the getLevelNamesMapping() function.

This returns a dict mapping string level names to their numeric levels.

>>> import logging
>>> from pprint import pprint
>>> pprint(logging.getLevelNamesMapping())
{'CRITICAL': 50,
 'DEBUG': 10,
 'ERROR': 40,
 'FATAL': 50,
 'INFO': 20,
 'NOTSET': 0,
 'WARN': 30,
 'WARNING': 30}

Syslog Automatically Reopens Socket

In this release the SysLogHandler provided by the logging.handlers package has bee updated so its behaviour more closely reflects those of SocketHandler and DatagramHandler with respect to socket opening.

Prior to this release, if the underlying socket for SysLogHandler became closed, the handler would just raise an error. Now the connection logic has been factored out into a createSocket() method, and this is called both during __init__() and every time a logging event is emitted, if the current socket is closed. This means that the handler should only fail if the remote socket is unavailable at the time the event is emitted.

time

The implementation of time.sleep() is platform-dependent, but on Unix and Windows it has changed in this release to use higher-resolution timers where available.

Unix

On Unix, sleep() has typically been implemented using the select() call, which allows for microsecond resolution delays. The resolution isn’t a major issue, since a microsecond is still a very short wait, but there is a more serious issue — select() bases its delays on current time of day and hence will be impacted by clock changes.

Automated systems like ntpd and chrony usually try very hard to only change system clocks gradually (generally known as “slewing” them). However, it’s hard to rule out sudden jumps if clocks become badly unsynchronised, and it wouldn’t be great for application resilience if such a jump caused a sleep() based on select() to suddenly wait for an hour instead of a few seconds.

Fortunately, on some platforms there are functions less prone to these issues, and which also offer higher-resolution times. These functions are nanosleep() and clock_nanosleep(). These functions both allow the current thread to pause for a period specified in nanoseconds, but clock_nanosleep() offers more flexibility such as selecting the system clock to be used.

As of Python 3.11, time.sleep() on Unix systems will prefer to use clock_nanosleep() with CLOCK_MONOTONIC where this function is available. It also passes an additional TIMER_ABSTIME flag in this case, which means that the behaviour of the function becomes “sleep until this absolute time” instead of “sleep for this many nanoseconds”. This avoids problems where the interval becomes inaccurate under a high rate of signals, because each signal interrupts the waiting and the recalulation of the remaining time may suffer rounding errors.

Where clock_nanosleep() isn’t available, the implementation falls back on nanosleep(). This also uses CLOCK_MONOTONIC on Linux, but potentially CLOCK_REALTIME on other platforms as this is specified by POSIX. It also doesn’t offer the TIMER_ABSTIME option.

Where neither of the above functions is available, Python falls back on select() as before.

Windows

Prior to this release, the resolution of time.sleep() on Windows was milliseconds. As of Windows 8.1, however, higher resolution timers are available and these are used if possible.

This involves using CreateWaitableTimerExW(), although the best resolution isn’t available until Windows 10 version 1803 and involves passing the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag to this function. At initialisation time Python tests to see whether the function supports this flag and uses it if available — this offers a resolution of 100 nanoseconds.

Conclusion

The part of writing this article during which I learned the most was undoubtedly going through the changes to the sqlite3 module, not only going through the changes themselves, but as a side effect discovering number of other SQLite features which were already supported but I’d completely missed. I’ve always had a soft spot for SQLite as a local data store, and the ability to use an in-memory database and serialise it on-demand is particularly tempting for those cases where you want to store structured information without having to worry about implementing your own serialisation to and from files.

The addition of hashlib.file_digest() is also a welcome one — I must have written something like this at least ten or fifteen times in the past, and it’s always been with a slight annoyance that something like this doesn’t already exist. The fact that the implementation is nicely optimised is a bonus. In software engineering we’re often told “don’t reinvent the wheel”, but it’s still important to make sure that other people’s wheels you use instead aren’t in an awful state.

Finally, I like the changes to time.sleep(), particularly on Linux — the increased resolution is, to me, a very minor point, but the robustness against realtime clock changes is a notable improvement.

That about wraps it up for now. I’ve got about eleven more modules whose changes I want to cover, and I think I should be able to get them all sorted out in the next article. Mind you, I’d originally intended to fit them all in this one before I realised how long it was getting, so it’s by no means certain that my optimism will be justified!


  1. I don’t know how you can jump in if you’re buckled up, but it’s my article and I’ll mix my metaphors if I want to. 

  2. A compound SELECT is a series of different SELECT statements connected by compound operators such as UNION, INTERSECT and EXCEPT

  3. Unfortunately how the permissions are stored in ZipInfo is not made particularly clear. On Unix systems, at least, if you take the external_attr attribute of a ZipInfo and shift it down 16 bits, then you can test the resultant bitfield against the same permissions bits as defined in the stat module (e.g. stat.S_IFDIR to indicate a directory, stat.S_IXUSR to indicate owner execute permission). Sadly, the structure of this bitfield is platform-specific and very poorly documented, which irritates me — perhaps a topic for a future article! 

  4. At least you’d better be reading them in fixed-size chunks — if you’re in the habit of reading an entire file into memory at once then you’d better be very sure a potential attacker can’t influence the size of the file or you’ve just created yourself a simple DoS attack on your application. 

  5. If you’re wondering where this method is documented, look at the documentation for the io.RawIOBase object in the standard library. 

22 Jan 2023 at 1:30PM in Software
 |   |