☑ What’s New in Python 3.9 - Library Changes

5 Nov 2022 at 2:32PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we continue our look at Python 3.9 by going through the notable changes in the standard library. These include concurrency improvements with changes to asyncio, concurrent.futures, and multiprocessing; networking features with enhancements to ipaddress, imaplib, and socket; and some additional OS features in os and pathlib.

This is the 20th of the 22 articles that currently make up the “Python 3 Releases” series.

python 39

Following on from the previous article looking at new features in Python 3.9, this one discusses changes to existing library modules. There’s a smaller set of changes than some of the previous articles here, partly because this release seemed to have fewer, and partly because I’ve tried to be a little more selective about which ones are useful enough to be worth covering.

Numeric and Mathematical Modules

math

A few small changes in math, the first of which is simply that math.gcd() now accepts more than the previous two arguments. It will return the largest integer which is a divisor of all arguments, unless any are zero in which case zero is returned.

On a related note there’s a new math.lcm() method which returns the lowest common multiple of a specified set of integers — that is, the smallest value of which every specified integer is a divisor.

Another new function is math.nextafter(), which returns the next distinctly representable float value after a specified one in a specified direction. It takes two parameters, the first being the starting point and the second being a target value to indicate the direction in which the next value should be located. This mirrors the nextafter() function in libc.

>>> import math
>>> math.nextafter(0.0, 1.0)
5e-324
>>> math.nextafter(5.0, 10.0)
5.000000000000001
>>> math.nextafter(3000.0, -math.inf)
2999.9999999999995

Finally, there is a third new function math.ulp() which returns the value of the least-significant bit of the specified value — in other words, the difference in the magnitude of the float value between the least-significant bit being zero or one. Due to the nature of floating point values, this delta becomes larger with the magnitude of the value. The name of the function comes from the commonly used term for this: unit in the last place.

>>> math.ulp(1.0)
2.220446049250313e-16
>>> math.ulp(1000000.0)
1.1641532182693481e-10
>>> math.ulp(1.0e10)
1.9073486328125e-06
>>> math.ulp(1.0e100)
1.942668892225729e+84

Generic Operating System Services

os

On Linux platforms only, the os.pidfd_open() function and os.P_PIDFD constant have been added to support a newish features in the Linux kernel called pidfds. A detailed discussion is outside the scope of this article2, but suffice to say you can pass the PID of an existing process to pidfd_open() to obtain a file descriptor that refers to a process and can be monitored with the usual poll() and select() mechanisms, so it’s useful for integrating into existing IO loops. Also see the discussion of PidfdChildWatcher in asyncio later in this article.

For users of other platforms, the os.putenv() and os.unsetenv() calls are now always available — the unsetenv() call has been added to Windows, where it was previously missing, and Python now requires all other platforms to provide setenv() and unsetenv() calls for it to build successfully.

Finally, there’s a new function os.waitstatus_to_exitcode() which translates the status code returned from os.wait() and os.waitpid() into the actual exit code of the process, if the process terminated normally, or the negation of the signal used to terminate the process otherwise.

>>> import os
>>> import signal
>>> import sys
>>> import time
>>>
>>> if (pid := os.fork()) == 0:
...     sys.exit(13)
...
>>> waited_pid, status = os.wait()
>>> waited_pid == pid
True
>>> status
3328
>>> os.waitstatus_to_exitcode(status)
13
>>>
>>> if (pid := os.fork()) == 0:
...     time.sleep(3600)
...     sys.exit(10)
...
>>> os.kill(pid, signal.SIGTERM)
>>> waited_pid, status = os.wait()
>>> waited_pid == pid
True
>>> signal.SIGTERM
<Signals.SIGTERM: 15>
>>> status
15
>>> os.waitstatus_to_exitcode(status)
-15

This is definitely less tedious than the usual if/else dance one has to perform using os.WIFSIGNALED(), os.WIFEXITED() and os.WEXITSTATUS() to get this information, and proves to be handy if you don’t use higher-level abstractions provided by subprocess and the like.

One caveat worth noting is that if a process receives SIGSTOP then it hasn’t actually terminated, but merely been suspended until it receives SIGCONT. If you’re using os.wait() then it won’t return, as there’s no process termination and no status to return. However, if you use os.waitpid(), os.wait3() or os.wait4() and you pass os.WUNTRACED in the options argument then these functions will return in the case of a suspended process. The os.waitstatus_to_exitcode() function doesn’t handle this case properly, so you’ll need to first check the returned status with os.WIFSTOPPED() and, if it returns True, then skip the call to os.waitstatus_to_exitcode().

Concurrent Execution

concurrent . futures

A handy handful of improvements to this useful module. First up, the shutdown() method of the Executor class has hitherto been to wait for all futures passed to the executor to complete execution before freeing all the associated resources. The wait parameter doesn’t affect this behaviour, that merely controls whether the method returns immediately or after the shutdown is complete, but either way the shutdown itself doesn’t happen until the futures are complete. The set of futures includes those which are currently executing, as you’d expect, but also futures which haven’t yet started, despite the fact these could often be safely cancelled. As of Python 3.9 there’s a new cancel_futures parameter to this method which will cause futures not yet started to be cancelled instead of executed. Any futures which have already started execution will still be waited on, however.

Secondly, both ThreadPoolExecutor and ProcessPoolExecutor have been updated to no longer use daemon threads3, but instead use an internal function hook that’s similar to atexit.register(), but called at threading shutdown instead of interpreter shutdown. This change was motivated by the fact that subinterpreters no longer support daemon threads — you can find more discussion in bpo-39812.

Finally there’s a performance improvement to ProcessPoolExecutor to ensure that it always reuses idle worker processes where possible and only spawns new ones where necessary. The change also addresses an issue where the max_workers parameter was regarded as the number of initial processes to spawn regardless of need, rather than the maximum number as it should be.

Networking and Interprocess Communication

asyncio

There are a number of changes to asyncio in this release.

Deprecating reuse_address

When using asyncio.loop.create_datagram_endpoint() the parameter reuse_address has defaulted to True, which sets the SO_REUSEADDR socket option on the UDP socket thus created for families AF_INET and AF_INET6. However, at least on Linux this option creates a serious security hole as it allows other processes to bind to the same address and port — the kernel will randomly distributed received packets between the bound processes. As a result in the 3.9 release1 this option has been defaulted to False, and any attempt to set it to True will raise an exception.

>>> import asyncio
>>> import socket
>>>
>>> async def func():
...     await asyncio.get_running_loop().create_datagram_endpoint(
...         lambda: asyncio.Protocol(),
...         family=socket.AF_INET,
...         reuse_address=True)
...
>>> asyncio.run(func())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File ".../lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "<stdin>", line 2, in func
  File ".../lib/python3.9/asyncio/base_events.py", line 1331, in create_datagram_endpoint
    raise ValueError("Passing `reuse_address=True` is no "
ValueError: Passing `reuse_address=True` is no longer supported, as the usage
            of SO_REUSEPORT in UDP poses a significant security concern.

Graceful Executor Shutdown

The next change is to address the issue that the BaseEventLoop.close() method can leak dangling threads — it doesn’t wait for the default executor to close and hence the threads in the associated ThreadPoolExecutor don’t get joined. To resolve this, a new coroutine loop.shutdown_default_executor() has been added which calls shutdown() on the executor and waits for it to complete. This new coroutine is now also called by asyncio.run() right after shutting down the async generators, so you’ll benefit from the safety without code changes as long as you’re using this.

PidfdChildWatcher

Next up we look at the addition of PidfdChildWatcher to asyncio. This is a new process watcher, which are a set of different policies used to monitor child processes. There are four choices of policy for this purpose prior to 3.9 which offer different tradeoffs between performance overhead, and the chances of conflicting with third party code which manages its own subprocesses.

I won’t go into them in detail here, but before you worry too much about the differences you might like to check out issue #94597 where there are plans for removing most of these options in release 3.12. For the purposes of this article, I’ll just concentrate on the change in release 3.9.

The new PidfdChildWatcher manages a good balance of performance and safety by using the Linux kernel feature pidfds, which were described earlier in the article in the os section. Since it does use signals or threads, it’s both safe and performant, and it scales linearly with the number of child processes. The downside is that it’s a Linux-specific extension so not available on other platforms.

to_thread

When running code asynchronously, the main hazard to performance is IO-bound blocking functions. In general this issue is avoided by using non-blocking alternatives, but some libraries may not offer such options. For these cases the new asyncio.to_thread() coroutine can come in handy.

All this does is execute the specified function in a separate thread, and returns a coroutine which can be awaited to get the result when the function returns. Hopefully the following example makes it fairly clear.

>>> import asyncio, time
>>>
>>> def blocking_function():
...     print("Starting blocking function")
...     time.sleep(3)
...     print("Ending blocking function")
...     return 123
...
>>> async def machine_goes_ping():
...     for n in range(8):
...         await asyncio.sleep(0.5)
...         print("Ping!")
...
>>> async def main():
...     print("Main started")
...     results = await asyncio.gather(
...         asyncio.to_thread(blocking_function),
...         machine_goes_ping()
...     )
...     print(f"Results: {results}")
...
>>> asyncio.run(main())
Main started
Starting blocking function
Ping!
Ping!
Ping!
Ping!
Ping!
Ending blocking function
Ping!
Ping!
Ping!
Results: [123, None]

This isn’t actually IO-specific and could, in principle, also be used for CPU-intensive functions which would otherwise block the event loop. The main caveat here is that the GIL tends to prevent such parallelisation in Python code anyway, so the separate thread doesn’t provide any benefit. However, in third party extensions which release the GIL themselves then this benefit may also apply here too.

Internet Protocols and Support

imaplib

A brace of changes to IMAP support. Firstly, the IMAP4 and IMAP4_SSL constructors now allow a timeout to be specified. Yes, it’s 2020 and Python is still adding timeouts to its network-oriented blocking functions. I’m really trying very hard not to rant about why people don’t always add timeouts to every single blocking function they ever implement — it’s like implementing a file format without including a “version” field, you just don’t do it4.

Secondly, the IMAP4 class, and hence its subclasses, now support an unselect() method. Like the close() method, this frees up any resources the server has allocated for the current mailbox and returns the connection to the freshly authenticated state. Unlike close(), however, it does not also remove any messages marked “deleted” from the mailbox.

ipaddress

The ipaddress module now supports IPv6 scoped address literals. Without wanting to dive into a detailed summary of IPv6, I’ll try and explain what this means as briefly as I can5. Every IPv6 address has a scope which defines where that address is valid. For unicast and anycast addresses, there are only two scopes: global and link-local. Global addresses are potentially globally routable, and are what most people will think of as IPv6 addresses. Link-local addresses are only valid on a specific network interface, and so they’re only unique if combined with the interface identifier. This is appended to the address with a suffix using a % separator, as in fe80::1ff:fe23:4567:890a%eth26 — it’s this format which is now supported.

>>> import ipaddress
>>> addr = ipaddress.ip_address("fe80::1ff:fe23:4567:890a%eth2")
>>> addr.is_link_local
True
>>> addr.scope_id
'eth2'

Importing Modules

importlib

A couple of small changes to importlib in this release. First up, importlib.util.resolve_name() now raises ImportError instead of ValueError for invalid relative imports, for consistency with the builtin import statement.

A new importlib.resources.files() function has also been added which allows access to resources stored in nested containers. It returns a Traversable object which has methods to access information about itself and any objects nested with it, whether they are resources (“files”) or containers (“subdirectories”). The contents are directly available via the read_bytes() and read_text() methods, but there’s also an importlib.resources.as_file() method if a real file is required — this is used as a context manager and extracts the resource to the filesystem, returning the path of the file. It will automatically clean up any temporary files at the end of the context block.

Python Language Services

ast

The Abstract Syntax Trees module has acquired some useful changes for those who find themselves parsing or analysing Python source code. The ast.dump() method can now have indent=True specified to produce a more readable multiline output. There’s also an ast.unparse() method to re-create Python source code which would have resulted in the syntax tree provided.

>>> import ast
>>> tree = ast.parse("""
... def my_function(arg):
...     x = arg**2
...     return (x+10)
... """)
>>> print(ast.dump(tree, indent=True))
Module(
 body=[
  FunctionDef(
   name='my_function',
   args=arguments(
    posonlyargs=[],
    args=[
     arg(arg='arg')],
    kwonlyargs=[],
    kw_defaults=[],
    defaults=[]),
   body=[
    Assign(
     targets=[
      Name(id='x', ctx=Store())],
     value=BinOp(
      left=Name(id='arg', ctx=Load()),
      op=Pow(),
      right=Constant(value=2))),
    Return(
     value=BinOp(
      left=Name(id='x', ctx=Load()),
      op=Add(),
      right=Constant(value=10)))],
   decorator_list=[])],
 type_ignores=[])
>>> print(ast.unparse(tree))
def my_function(arg):
    x = arg ** 2
    return x + 10

Other Changes

A few more useful changes too small to warrant their own sections.

isocalendar() Now Returns namedtuple
The isocalendar() method of both datetime.date and datetime.datetime now returns a namedtuple instead of a tuple, with fields year, week and weekday.
Linux File Descriptor Lock Constants in fcntl
The fcntl module now offers constants F_OFD_GETLK, F_OFD_SETLK and F_OFD_SETLKW, which are used for the Linux-specific file descriptor locking. This contrasts with regular fcntl() locks which are per-process not per-descriptor.
New HTTP Status Codes
HTTP status codes 103 EARLY_HINTS and 425 TOO_EARLY, as well as the ever-useful 418 IM_A_TEAPOT, are now available in http.HTTPStatus.
Added close() to multiprocessing.SimpleQueue
There’s a new close() method on SimpleQueue which closes the file descriptors used for the associated pipe. This is useful to avoid leaking file descriptors if, for example, a reference to the queue persists for longer than expected.
Added readlink() Method to pathlib.Path
This behaves in the same way as the existing os.readlink() function.
Added random.randbytes()
This is somewhat similar to existing random.getrandbits() but instead of returning an int of specified length, it returns a bytes object of the specified length7.
Sending Signals Via pidfd (Linux only)
As mentioned earlier in the article support for pidfds has been added on Linux. The signal module has a new function pidfd_send_signal() which sends a signal to a process specified by a pidfd, as opposed to os.kill() which expects a PID.
Sending File Descriptors Over UDS
When using Unix Domain Sockets, i.e. those in family AF_UNIX, it’s long been possible to send and receive file descriptors over them. However, the sendmsg() and particularly recvmsg() invocations required to do so are a little fiddly. The socket module has now acquired new send_fds() and recv_fds() methods to wrap this up more conveniently.
stderr Now Line-Buffered By Default
Previously sys.stderr would always be line-buffered if connected to a TTY, and block-buffered otherwise. To ensure errors are displayed more promptly, however, it’s now always line-buffered by default. If this causes problems you can either run python with the -u switch for force stdout and stderr to be unbuffered, you can call flush() when required, or you can reopen sys.stderr with os.fdopen() passing an appropriate value for the buffering parameter.
$ python3.8 -c 'import sys; print(sys.stderr.line_buffering)'
True
$ python3.8 -c 'import sys; print(sys.stderr.line_buffering)' 2>/tmp/output
False
$ python3.9 -c 'import sys; print(sys.stderr.line_buffering)' 2>/tmp/output
True
Added reset_peak() to tracemalloc
This resets the peak size of traced memory usage to the current size, so the maximum usage of a particular scope can be monitored.

Conclusion

A fairly incremental release all round, this one, but it’s good to see asyncio continues to improve at a reasonable pace. Access to pidfds on Linux also looks pretty handy, especially when multiple threads within a process need to interact with child processes — it’s a shame that this feature isn’t more portable, but I daresay these days there are quite a few commercial programmers who can assume the use of the Linux platform for in-house code.

In the next article I’ll move on to Python 3.10, just as Python 3.11 gets released — at least catching up feels more feasible than it did when I started this whole series!


  1. Actually this was also introduced into Python 3.8.1, but since I’m only looking at the initial version of each major release in these articles this is where it lands. 

  2. If you want to find out a lot about the development of pidfds, there’s an in-depth video from the Kernel Recipes 2019 conference. 

  3. Normally the Python interpreter won’t exit until all threads have completed, even after the main thread has ended. A thread marked as a daemon thread, however, doesn’t count for this purpose and will be forcefully terminated at shutdown instead. 

  4. Though of course people do still do it, as the evidence clearly indicates. 

  5. But as any regular readers will know, brevity is not one of my core competencies. 

  6. Strictly speaking that’s not necessarily valid since support for string-based suffixes is optional, but it’s typically the way it’s done on Unix. On Windows the interface identifiers are numerical, and since support for numerical identifiers is mandatory, unlike strings, then they should also work on Unix as well. 

  7. Note that as with other functions in random, this function generates pseudorandom values which aren’t suitable for cryptographic purposes. If you want more secure bytes, use either os.urandom() or the token_bytes() method from the secrets module that was added in Python 3.6

This is the 20th of the 22 articles that currently make up the “Python 3 Releases” series.

5 Nov 2022 at 2:32PM in Software
 |   | 
Photo by Jan Kopriva on Unsplash