In this series looking at features introduced by every version of Python 3, we continue our tour of changes in Python 3.11, covering some more standard library changes in numerical modules, path manipulation, SQLite support and more.
This is the 28th of the 32 articles that currently make up the “Python 3 Releases” series.
Following on from the previous article, this is probably my penultimate look at the new Python 3.11 release. In it we’ll run through more changes to the standard library. There’s a fair bit to get through here, so buckle up and let’s jump in!1
There are a handful of changes in the mathematical modules in this release, with some small changes to fractions
and some slightly more substantial changes to math
.
A couple of changes to fractions
, the first being that underscores are supported in numbers, as per PEP 515, when initialising Fraction
from a string.
>>> import fractions
>>> fractions.Fraction("3_200/100_000")
Fraction(4, 125)
The second change is that Fraction
now supports an __int__()
method. This doesn’t change too much, since it already offered __trunc__()
which meant you could still pass one to int()
and get a truncated integer value, since int()
falls back on __trunc__()
if no __int__()
method is found. Though whilst we’re on the subject that delegation is deprecated in Python 3.11, so will presumably stop working anyway in a few releases.
It’s also interesting to note there’s a good reason that __int__()
wasn’t added to Fraction
originally, because before the advent of the __index__()
and __trunc__()
methods, the meaning of __int__()
was rather overloaded — it meant you could use the wrong type in contexts such as being passed to chr()
where things like a Fraction
or Decimal
really shouldn’t have worked. Due to the deprecation periods, this sort of thing only actually started raising TypeError
in Python 3.10.
Anyway, past and future deprecations aside, the main difference in this particular release is that the existence of the __int__()
method means that a Fraction
instance now passes as a subclass of typing.SupportsInt
.
>>> x = fractions.Fraction(20, 3)
>>> int(x)
6
>>> x.__int__()
6
>>> import typing
>>> isinstance(x, typing.SupportsInt)
True
The math
module now exposes a couple of new functions from libm
:
math.exp2(n)
returns $ 2^{n} $.math.cbrt(n)
returns $ \sqrt[3]{n} $.There are also a couple of small changes for better IEEE 754 compliance.
exp2()
¶I think this is really just a consistency change, exposing the exp2()
function in the underlying libm
library, because Python already has 2**x
and math.pow(2, x)
. However, when you write an implementation for a fixed exponent, it can be possible to be more accurate for a wider range of bases. I’m not qualified to comment on the accuracy of the exp2()
implementation, but there are cases where it differs from math.pow()
— in the snippet below you can see it differed in around a quarter of a percent of a million randomly selected values.
>>> import math
>>> math.exp2(1.5)
2.8284271247461903
>>> 2 ** 1.5
2.8284271247461903
>>>
>>> import random
>>> count = 0
>>> num_samples = 1_000_000
>>> for n in range(num_samples):
... x = random.uniform(-1000.0, 999.0) + random.random()
... if math.pow(2, x) != math.exp2(x):
... count += 1
...
>>> count * 100 / num_samples
0.2348
cbrt()
¶Another libm
function being exposed, but there’s more of a case for this than exp2()
since math.pow()
doesn’t support negative bases with fractional exponents, raising a ValueError
. The **
operator works, but as you can see in the snippet below its result differs from cbrt()
in around 87% of cases, and it’s reasonable to assume that cbrt()
is the more accurate in these cases.
>>> math.cbrt(1.860867)
1.23
>>> math.pow(1.860867, 1/3)
1.23
>>>
>>> count = 0
>>> num_sample = 1_000_000
>>> for n in range(num_samples):
... x = random.uniform(-1000.0, 999.0) + random.random()
... if math.cbrt(x) != x ** (1/3):
... count += 1
...
>>> count * 100 / num_samples
86.5953
A couple of small changes here. The first is that math.nan
can always be assumed to be available — previously it may not have been defined if IEEE 754 support wasn’t available. The second change is that math.pow(0.0, -math.inf)
and math.pow(-0.0, -math.inf)
now return math.inf
as the IEEE 754 spec requires — previously they raised ValueError
.
In my view, perhaps the most notable change is the direct consequence of these changes: that the Python interpreter build now requires IEEE 754 support and will fail without it. I don’t see this being a major issue for many people, mind you, as I very much doubt Python would be used on any platforms so minimal they don’t support floating point.
The behaviour of the glob()
and rglob()
methods of the pathlib.Path
object and its subclasses did not quite behave correctly prior to Python 3.11 when the pattern supplied ended with a directory separator. These functions are supposed to only return matches which are themselves directories in this case, but it was being ignored.
This has been corrected in Python 3.11, as demonstrated in the snippet below.
>>> import pathlib
>>> list(pathlib.Path("/etc").glob("sudo*"))
[PosixPath('/etc/sudoers.d'), PosixPath('/etc/sudoers'), PosixPath('/etc/sudo_lecture')]
>>> list(pathlib.Path("/etc").glob("sudo*/"))
[PosixPath('/etc/sudoers.d')]
A number of changes in sqlite3
in this release. I don’t quite know why the sudden burst of activity on this library in this release, perhaps the core maintainer of this module finally had some free time — but whatever the reason, there are some welcome changes here!
Due to the number of changes this section is a little long, so if you’re not remotely interested in SQLite, skip to the Data Compression and Archiving section.
There are a few changes that I think it’s worth drilling into here. The ones where I’ve gone into a little detail and given their own subsections are:
bytes
BLOB
fieldsThere’s also a final section which groups together a set of changes which I felt only warranted a brief mention.
The module now offers the setlimit()
and getlimit()
functions to access SQLite’s runtime limits API. This allows various resource limitations to be set on a per-connection basis, with the following table showing the limits available.
SQLITE_LIMIT_LENGTH
SQLITE_LIMIT_SQL_LENGTH
SQLITE_LIMIT_COLUMN
SELECT
result set, the number of terms in GROUP BY
and ORDER BY
clauses, and the number of terms in an INSERT
statement.SQLITE_LIMIT_EXPR_DEPTH
SQLITE_LIMIT_COMPOUND_SELECT
SELECT
statements in a compound SELECT
2.SQLITE_LIMIT_VDBE_OP
SQLITE_LIMIT_FUNCTION_ARG
SQLITE_LIMIT_ATTACHED
SQLITE_LIMIT_LIKE_PATTERN_LENGTH
LIKE
or GLOB
operators. This is limited to avoid denial of service attacks which could result from very large patterns (i.e. millions of bytes).SQLITE_LIMIT_VARIABLE_NUMBER
SQLITE_LIMIT_TRIGGER_DEPTH
PRAGMA recursive_triggers
.SQLITE_LIMIT_WORKER_THREADS
Each setting has a hard limit, which are set at compile-time, and trying to increase the value above this will silently set it to the hard limit value instead. The return value of setlimit()
is whatever the previous value was, whether or not it differs. The non-destructive way to query the current value is to call getlimit()
.
The snippet below shows us querying a table which doesn’t exist and (as expected) getting an error to that effect. But then we query the limit of the SQL query string, and change it to an absurdly small 10 characters, and then we see the error we get back has changed.
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("SELECT * FROM some_table_that_does_not_exist")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.OperationalError: no such table: some_table_that_does_not_exist
>>> conn.getlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH)
1000000000
>>> conn.setlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH, 10)
1000000000
>>> conn.getlimit(sqlite3.SQLITE_LIMIT_SQL_LENGTH)
10
>>> conn.execute("SELECT * FROM some_table_that_does_not_exist")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.DataError: query string is too large
There are a few places within sqlite3
where you can register your own callback functions which are directly invoked from the SQLite code. Some examples of functions which register such callbacks are:
create_function()
declares a user function which can be called from SQL.create_aggregate()
specifies a class with methods to determine how to aggregate multiple values into one in a customised manner.create_collation()
is a way of implementing custom sort orderings using a custom function.set_authorizer()
registers a callback which is invoked to control access to columns and tables.set_progress_handler()
sets a callback to be invoked every N instructions in the virtual machine, so a single thread can still perform background tasks during long-running queries.set_trace_callback()
invokes the specified function for every SQL statement which is run.Since these are user-defined Python code they can, of course, raise exceptions. Since these pass through the SQLite code, however, preserving the tracebacks doesn’t come for free. As a result, you can decide whether you want them by calling sqlite3.enable_callback_tracebacks()
. In the snippet below you can see the default behaviour where such exceptions are silently swallowed:
>>> import sqlite3
>>> def naughty():
... raise ValueError("XXX")
...
>>> def callback(stmt):
... print(f"Running: {stmt}")
... naughty()
...
>>> conn = sqlite3.connect(":memory:")
>>> conn.set_trace_callback(callback)
>>> cursor = conn.execute("SELECT 1")
Running: SELECT 1
If you enable tracebacks, you don’t get to catch the exception, but they are printed, as demonstrated in the continuing snippet below.
>>> sqlite3.enable_callback_tracebacks(True)
>>> try:
... cursor = conn.execute("SELECT 1")
... except Exception as exc:
... print(f"Caught exception: {exc}")
...
Running: SELECT 1
Exception ignored in: <function callback at 0x10d6d72e0>
Traceback (most recent call last):
File "<stdin>", line 3, in callback
File "<stdin>", line 2, in naughty
ValueError: XXX
You can see that the exception wasn’t caught, but it was printed.
So far so Python 3.10 — the difference in Python 3.11 is simply that the instead of calling PyErr_Print()
to display the exception, it instead calls PyErr_WriteUnraisable()
to trigger an “unraisable exception”. By default this is much the same, but means that you can register another callback using sys.unraisablehook
which gives you some attempt to deal with it. The snippet below shows how you can do this (with minor edits for formatting).
>>> import sys
>>> def exception_hook(exc):
... print(f"Got exception: {exc}")
...
>>> sys.unraisablehook = exception_hook
>>> cursor = conn.execute("SELECT 1")
Running: SELECT 1
Got exception: UnraisableHookArgs(
exc_type=<class 'ValueError'>,
exc_value=ValueError('XXX'),
exc_traceback=<traceback object at 0x10d72cf00>,
err_msg=None,
object=<function callback at 0x10d6d72e0>)
Support for the SQLite serialisation API has been added in the form of two new methods on sqlite.Connection
object called, appropriately enough, serialize()
and deserialize()
.
Calling serialize()
will produce a bytes
object which represents the content of the database. For the common case of a database backed by a file on disk, the serialised version will simply be a copy of the file. However, for an in-memory database it will be the byte stream that would written to disk if the database were saved.
The deserialize()
method performs the inverse, converting a serialised byte stream into an in-memory database. Whatever the connection is currently attached to will be disconnected and, in the case of an in-memory database, dropped.
This is demonstrated in the simple example below.
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> cur = conn.cursor()
>>> cur.execute("BEGIN")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("CREATE TABLE foo (one INTEGER PRIMARY KEY, two TEXT)")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("INSERT INTO foo (one, two) VALUES (1, 'aaa')")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("INSERT INTO foo (one, two) VALUES (2, 'bbb')")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("COMMIT")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.fetchall()
[(1, 'aaa'), (2, 'bbb')]
>>>
>>> data = conn.serialize()
>>> type(data)
<class 'bytes'>
>>> len(data)
8192
>>> cur.execute("DELETE FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f036c0>
>>> cur.fetchall()
[]
>>> conn.deserialize(data)
>>> cur = conn.cursor()
>>> cur.execute("SELECT * FROM foo")
<sqlite3.Cursor object at 0x105f03640>
>>> cur.fetchall()
[(1, 'aaa'), (2, 'bbb')]
You can see that an in-memory database is created with a simple table foo
, with two records in it. This is serialised into the data
variable, and then the entries in the in-memory version are wiped. Upon the call to deserialize()
the data is replaced in memory as it was, and the records are back.
Over the years, databases have increasingly gone from being purely oriented around structured records of fundamental types, such as integers and small strings, to being more general purpose data stores. Examples of this include content management systems or wikis, which are really document-oriented storage but are often built on top of relational backend databases, sometimes because they date from before NoSQL databases enjoyed mainstream popularity.
The BLOB
type in SQLite works well enough for storing data of reasonable size, with the usual limit being a billion bytes (~953 MB). At these sizes, however, the API’s approach of loading entire values into memory in order to use them becomes cumbersome — memory footprints of hundreds of MB are, at best, inefficient and may be impractical.
Fortunately the SQLite API offers a file-like view of these values, and as of Python 3.11 this API is now exposed as part of the sqlite3
module.
The blobopen()
method on the sqlite3.Connection
opens a BLOB
value like a file. A table, column and row are specified as arguments, and an sqlite3.Blob
object is returned, which is a file-like object providing the usual read()
, write()
, seek()
and tell()
methods. Also, just as with normal file descriptors, the Blob
object can be treated like a context so it’s automatically closed.
Take a look at the code below to illustrate this.
>>> conn = sqlite3.connect(":memory:")
>>> conn.execute("CREATE TABLE foo (id INTEGER PRIMARY KEY, data BLOB)")
<sqlite3.Cursor object at 0x10f3436c0>
>>> conn.execute("INSERT INTO foo (id, data) VALUES (123, zeroblob(64))")
<sqlite3.Cursor object at 0x10f343640>
>>> conn.execute("INSERT INTO foo (id, data) VALUES (456, zeroblob(64))")
<sqlite3.Cursor object at 0x10f3436c0>
>>> with conn.blobopen("foo", "data", 123) as blob_fd:
... blob_fd.write(b"Hello")
... blob_fd.write(b", ")
... blob_fd.write(b"world!")
...
>>> with conn.blobopen("foo", "data", 123, readonly=True) as blob_fd:
... print(blob_fd.read(16))
...
b'Hello, world!\x00\x00\x00'
Hopefully this is fairly straightforward, but there are a couple of quirks. Firstly, blobopen()
can’t change the size of a blob, so the INSERT
statements above use the SQLite function zeroblob()
to create BLOB
values of 64 bytes filled with null bytes.
Secondly, you’ll notice that the bytes
objects are read and written — str
and other Unicode-aware objects are not supported, and must be encoded/decoded as needed.
There are a number of small changes which don’t need much explanation, which I’ve grouped below.
set_authorizer()
function sets a callback that’s used in SQLite’s authorisation callbacks scheme. As of Python 3.11 this can now also accept None
to remove the callback.create_collation()
creates a new collation, mentioned earlier. In this release the names of collations can now also include Unicode characters. Passing invalid characters raises UnicodeEncodeError
.sqlite_errorcode
and sqlite_errorname
attributes of the exception.sqlite3.threadsafety
attribute which returns the level of thread safety compiled in to the SQLite module. The value is an integer which corresponds to the defined meanings for the field in the DB API 2.0.create_window_function()
has been added which creates a window function. This is similar to an aggregation function, but instead of collapsing multiple values into fewer rows, this modifies rows based on surrounding rows — for example, to calculate a moving average.A few improvements to the zipfile
module, which provides an interface to manipulating files in ZIP format.
The most significant improvement in zipfile
in this release is the addition of the mkdir()
method for creating an empty directory directly within an archive file. There are two ways to call this method:
mode
argument which specifies the permissions (defaulting to 0511
or r-x--x--x
).zipfile.ZipInfo
instance instead of a string, where attributes such as filename
and permissions are already stored3.>>> import zipfile
>>> zip_file = zipfile.ZipFile("/tmp/test.zip", mode="a")
>>> print("\n".join(repr(i) for i in zip_file.infolist()))
<ZipInfo filename='testdir/' filemode='drwxr-xr-x' external_attr=0x10>
<ZipInfo filename='testdir/some_file' filemode='-rw-r--r--' file_size=24>
<ZipInfo filename='testdir/another_file' filemode='-rw-r--r--' file_size=30>
>>> zip_file.mkdir("testdir/subdirectory", mode=0o750)
>>> print("\n".join(repr(i) for i in zip_file.infolist()))
<ZipInfo filename='testdir/' filemode='drwxr-xr-x' external_attr=0x10>
<ZipInfo filename='testdir/some_file' filemode='-rw-r--r--' file_size=24>
<ZipInfo filename='testdir/another_file' filemode='-rw-r--r--' file_size=30>
<ZipInfo filename='testdir/subdirectory/' filemode='drwxr-x---' external_attr=0x10>
Due to the poor state of the documentation, it’s quite hard to create an accurate ZipInfo
instance directly — in particular some of the fields are platform-specific. However, do bear in mind there’s a factory method ZipInfo.from_file()
which creates a ZipInfo
instance representing a file on the filesystem. This is probably the easiest way to factor out all those platform-specific concerns, providing you do have a real file to use as a model.
A couple of other, smaller, changes:
metadata_encoding
parameter to the ZipFile
constructor which specifies the encoding of the metadata within the ZIP file. This is only valid when reading ZIP files — if mode
is anything other than "r"
, the constructor raises a ValueError
. Also, there is only a single setting on a ZipFile
instance, there’s no way to deal with cases where different members in the same file use different encodings for their filenames. Finally, any flags in the ZIP file header which specify an encoding override this setting.zipfile.Path
class represents a path within a specific zip file, including the path of the zip file itself. As of this release, it now offers additional attributes stem
, suffix
and suffixes
, which have the same meanings as for pathlib.PurePath
.A handy new helper function for hashing file-like objects and a couple of changes to hashing function implementations.
Hashing the content of files is not uncommon, but to do so with hashlib
is a little tedious because you end up writing that same boilerplate about reading files in fixed-sized chunks4 and passing them into the hash object until they’re done. It’s not difficult, but it’s dull to have to write.
Thankfully there’s now a new helper hashlib.file_digest()
to do this for you. It takes an open file object and either the name of a hash algorithm, or a callable which returns a suitable hash object.
The function then proceeds to construct an instance of the hash object, then read file data in fixed chunks and pass them into the hash, which is then returned so the calling code can retrieve the digest in whatever format it wants (typically raw binary or hex).
>>> import hashlib
>>> with open("/usr/share/dict/words", "rb") as in_fd:
... hash_obj = hashlib.file_digest(in_fd, hashlib.sha1)
...
>>> hash_obj.digest()
b'\xa6.\xdf\x86\x85\x92\x0f}Z\x95\x110 c\x1c\xde\xbd\x18\xa1\x85'
>>> hash_obj.hexdigest()
'a62edf8685920f7d5a95113020631cdebd18a185'
An important note here is that the file must be open in binary mode, or you’ll get a ValueError
when you pass it to file_digest()
. Aside from the obvious fact that hash functions as byte-oriented so this makes logical sense, it’s also because the code uses the readinto()
method5 of the file object to read data into a pre-allocated bytearray
object — this method only available in binary mode.
It’s also interesting to note that it uses memoryview
to access the content of the bytearray
without copying — this is the sort of efficiency it’s easy to forget when rolling your own implementation, demonstrating the value of having this in the standard library.
BLAKE2 is a hash function, defined in RFC 7693, which is an evolution of one of the entrants for NIST’s competition to find the hash function to use as the SHA-3 standard. It did well, making it to the final 5, but was eventually pipped to the post by the Keccak family of functions.
With its performance improvements over the original BLAKE, which was assessed for the competition, it has some claims to offer equivalent protection to SHA-3 but at better performance. The newer BLAKE3 standard is much faster still, but not yet offered in the Python standard library, so outside the scope of this article.
Since being introduced into hashlib
back in Python 3.6, the implementation was one stored in the CPython source rather than linking to an external library. However, the official libb2
library is more optimised and reduces the maintenance burden on Python maintainers, so it makes sense to use it. As a first step in that direction in Python 3.11, the Python build now uses the library where it’s available when Python is being compiled, but still maintains its own implementation as a backup.
As an aside, if you’re compiling Python on a system with libb2
installed, but you don’t want to create a dependency on it, the notes on issue bpo-47095 has some advice for that towards the end.
tiny_sha3
When OpenSSL Unavailable¶Another change of implementation, this time for SHA-3. This one isn’t going to affect most people, since as of Python 3.10 if OpenSSL is used then version 1.1.1 is required — since this has an optimised implementation, it’s used whenever it’s available.
Therefore, only in cases where OpenSSL support is disabled at compile time, as of Python 3.11 the public domain tiny_sha3
module replaces the previous vendored copy of the Keccak Code Package which was in the source repository.
The tiny-sha3
is slower but the stripped library size is around 70% smaller — this means that the vast majority of users with OpenSSL support get a smaller build, but at the expense of a (presumably) tiny minority suffering a performance drop in SHA-3.
A couple of small enhancements to the logging
module to improve introspection and use of syslog, and some improvements to time.sleep()
to use higher-resolution clocks.
A couple of small changes, the first of which is generally useful, the second of interest mainly just to people who use SysLogHandler
.
There are occasions where it’s useful to present a list of the available logging levels — for example, perhaps as part of --help
option where the log level can be supplied as a command-line parameter. However, there’s been no way to do this short of hard-coding them prior to Python 3.11 — as of this release, there’s the getLevelNamesMapping()
function.
This returns a dict
mapping string level names to their numeric levels.
>>> import logging
>>> from pprint import pprint
>>> pprint(logging.getLevelNamesMapping())
{'CRITICAL': 50,
'DEBUG': 10,
'ERROR': 40,
'FATAL': 50,
'INFO': 20,
'NOTSET': 0,
'WARN': 30,
'WARNING': 30}
In this release the SysLogHandler
provided by the logging.handlers
package has bee updated so its behaviour more closely reflects those of SocketHandler
and DatagramHandler
with respect to socket opening.
Prior to this release, if the underlying socket for SysLogHandler
became closed, the handler would just raise an error. Now the connection logic has been factored out into a createSocket()
method, and this is called both during __init__()
and every time a logging event is emitted, if the current socket is closed. This means that the handler should only fail if the remote socket is unavailable at the time the event is emitted.
The implementation of time.sleep()
is platform-dependent, but on Unix and Windows it has changed in this release to use higher-resolution timers where available.
On Unix, sleep()
has typically been implemented using the select()
call, which allows for microsecond resolution delays. The resolution isn’t a major issue, since a microsecond is still a very short wait, but there is a more serious issue — select()
bases its delays on current time of day and hence will be impacted by clock changes.
Automated systems like ntpd and chrony usually try very hard to only change system clocks gradually (generally known as “slewing” them). However, it’s hard to rule out sudden jumps if clocks become badly unsynchronised, and it wouldn’t be great for application resilience if such a jump caused a sleep()
based on select()
to suddenly wait for an hour instead of a few seconds.
Fortunately, on some platforms there are functions less prone to these issues, and which also offer higher-resolution times. These functions are nanosleep()
and clock_nanosleep()
. These functions both allow the current thread to pause for a period specified in nanoseconds, but clock_nanosleep()
offers more flexibility such as selecting the system clock to be used.
As of Python 3.11, time.sleep()
on Unix systems will prefer to use clock_nanosleep()
with CLOCK_MONOTONIC
where this function is available. It also passes an additional TIMER_ABSTIME
flag in this case, which means that the behaviour of the function becomes “sleep until this absolute time” instead of “sleep for this many nanoseconds”. This avoids problems where the interval becomes inaccurate under a high rate of signals, because each signal interrupts the waiting and the recalulation of the remaining time may suffer rounding errors.
Where clock_nanosleep()
isn’t available, the implementation falls back on nanosleep()
. This also uses CLOCK_MONOTONIC
on Linux, but potentially CLOCK_REALTIME
on other platforms as this is specified by POSIX. It also doesn’t offer the TIMER_ABSTIME
option.
Where neither of the above functions is available, Python falls back on select()
as before.
Prior to this release, the resolution of time.sleep()
on Windows was milliseconds. As of Windows 8.1, however, higher resolution timers are available and these are used if possible.
This involves using CreateWaitableTimerExW()
, although the best resolution isn’t available until Windows 10 version 1803 and involves passing the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION
flag to this function. At initialisation time Python tests to see whether the function supports this flag and uses it if available — this offers a resolution of 100 nanoseconds.
The part of writing this article during which I learned the most was undoubtedly going through the changes to the sqlite3
module, not only going through the changes themselves, but as a side effect discovering number of other SQLite features which were already supported but I’d completely missed. I’ve always had a soft spot for SQLite as a local data store, and the ability to use an in-memory database and serialise it on-demand is particularly tempting for those cases where you want to store structured information without having to worry about implementing your own serialisation to and from files.
The addition of hashlib.file_digest()
is also a welcome one — I must have written something like this at least ten or fifteen times in the past, and it’s always been with a slight annoyance that something like this doesn’t already exist. The fact that the implementation is nicely optimised is a bonus. In software engineering we’re often told “don’t reinvent the wheel”, but it’s still important to make sure that other people’s wheels you use instead aren’t in an awful state.
Finally, I like the changes to time.sleep()
, particularly on Linux — the increased resolution is, to me, a very minor point, but the robustness against realtime clock changes is a notable improvement.
That about wraps it up for now. I’ve got about eleven more modules whose changes I want to cover, and I think I should be able to get them all sorted out in the next article. Mind you, I’d originally intended to fit them all in this one before I realised how long it was getting, so it’s by no means certain that my optimism will be justified!
I don’t know how you can jump in if you’re buckled up, but it’s my article and I’ll mix my metaphors if I want to. ↩
A compound SELECT
is a series of different SELECT
statements connected by compound operators such as UNION
, INTERSECT
and EXCEPT
. ↩
Unfortunately how the permissions are stored in ZipInfo
is not made particularly clear. On Unix systems, at least, if you take the external_attr
attribute of a ZipInfo
and shift it down 16 bits, then you can test the resultant bitfield against the same permissions bits as defined in the stat
module (e.g. stat.S_IFDIR
to indicate a directory, stat.S_IXUSR
to indicate owner execute permission). Sadly, the structure of this bitfield is platform-specific and very poorly documented, which irritates me — perhaps a topic for a future article! ↩
At least you’d better be reading them in fixed-size chunks — if you’re in the habit of reading an entire file into memory at once then you’d better be very sure a potential attacker can’t influence the size of the file or you’ve just created yourself a simple DoS attack on your application. ↩
If you’re wondering where this method is documented, look at the documentation for the io.RawIOBase
object in the standard library. ↩