☑ Python 2to3: What’s New in 3.6 - Part 3, Module Updates

22 Aug 2021 at 12:56AM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we finish our look at the updates in Python 3.6. This third and final article looks at the updates to library modules in this release.

This is the 14th of the 15 articles that currently make up the “Python 2to3” series.

green python two 36

As well as all the more significant changes in Python 3.6, which were outlined in the previous two articles, there are also the usual slew of standard library updates and additions. We’ll look at most of these in this article.

Compression & Archiving

We’ll kick off with some small changes to the zipfile module.

First up, the ZipInfo class, which represents the metainformation about a member of a zip file archive, has a new class method from_file() which allows creation of a ZipInfo instance from the specified filename. This allows code to create an instance from a file, but then override specific fields (e.g. last modified time) and then add it to an archive.

There’s also a ZipInfo.is_dir() method which is the equivalent of os.path.isdir() for a zip file entry, and ZipFile.open() now allows data to be added / updated in a zipfile, instead of just being used to extract it.

Together these changes make for a more flexible interface for manipulating ZIP files.

Concurrency

There’s a minor change to concurrent.futures where the ThreadPoolExecutor constructor now accepts a thread_name_prefix parameter, which allows you to insert some uniquely identifying string as a prefix of the names of the threads created by it. This could be extremely helpful when debugging issues in heavily multithreaded applications, where it can become quite confusing which threads were created where.

There are also some asyncio changes substantial enough to warrant their own section.

asyncio

There are a whole host of smaller changes to asyncio, whose API is now considered stable as of this release. Many of these were backported to Python 3.5.1, and hence I discussed them in the earlier article on changes in 3.5, and there was also some discussion in the section on asynchronous generators in the first Python 3.6 article.

There are a few notable changes I didn’t cover1 in any earlier articles, however:

get_event_loop() always returns current loop
When called from coroutines and callbacks, get_event_loop() now always returns the currently executing loop. Outside these contexts the previous behaviour still applies, which is to return whatever loop has been set within the current thread by calling set_event_loop(), or to raise RuntimeError except in the main thread where an event loop is created on-demand.
Transport.is_closing() added
This function returns True if the specified transport is closed or in the process of closing.
loop.stop() behaviour changed
There has been a change in the logic of loop.stop() which changes the behaviour around the execution of scheduled callbacks. Previously, calling stop() scheduled a callback which raised a private exception which stopped the loop. This has changed to simply set a flag in the loop which is checked every time a complete cycle of the loop is run. This means that if stop() is called by a callback, only the remainder of the current loop executes — any further callbacks scheduled by the existing callbacks being run will not run until the loop is started again later. If stop() is called prior to the loop running, it’ll run one complete cycle and then exit. You can find more discussion in this Python issue report if you want the details.
loop.connect_accepted_socket() added
This method is used in cases where code wishes to accept a connection synchronously, but then use asyncio to handle data transfer on it.
TCP_NODELAY by default
All TCP connections now set the TCP_NODELAY option to disable Nagle’s algorithm, which can badly impact performance in modern TCP/IP stacks for some workloads, particularly when combined with delayed ACKs. If you have a use-case where Nagling would actually be preferable, you can always set the option yourself using transport.get_extra_info("socket") to obtain the socket.socket instance and calling setsockopt() on it.
Performance improvements
Both Future and Task now have fast C implementations, improving performance by up to 30%.

Cryptography

The hashlib module has acquired support for some new algorithms in this release.

Firstly, there are two new functions which implement the BLAKE2 hashing algorithm, defined in RFC 7693. The creators claim that it’s both fast and highly secure, and they make some persuasive arguments.

The algorithm comes in two variants: BLAKE2b is optimised for 64-bit platforms, producing digests of up to 64 bytes, and BLAKE2s is optimised for smaller platforms down to 8-bit, producting digests of up to 32 bytes. The two variants are supported with the functions blake2b() and blake2s() respectively. I’m guessing most users of Python can assume a 64-bit platform these days, but it’s useful to have both variants in case you’re interoperating with smaller platforms.

Unlike some hashing algorithms, BLAKE2 allows any digest size to be generated, which needs to be selected when the hashing object is created. As you can see from the snippet below, this is an integral part of the algorithm as opposed to simply being a truncation operation at the end.

>>> import hashlib
>>> hasher = hashlib.blake2b()
>>> hasher.update(b"Hello, world\n")
>>> hasher.hexdigest()
'3028a38d034e6e5ef7bda22013d4fa20ca5cfb1fc48f8ef0984fba2cbcf9650dec54be93f51ea3f6fdc39e68473abf00a1dca08672f4dd8201b171bb01ad3129'
>>> hasher = hashlib.blake2b(digest_size=32)
>>> hasher.update(b"Hello, world\n")
>>> hasher.hexdigest()
'202e210b86fa26eb50f3ca8cc268db9f1bd83687a4d91ec37f16e062b6a7362e'

Both of these variants support a salt parameter for salted hashing, as well as a personal parameter for an additional “personalisation string” — this is like an additional salt which is set according to the context in which the hash is generated, to reduce the chances that a hash generated in one part of the code can be reused by an attacker to bypass a different protection which is using the same algorithm. The functions also take a key parameter, so BLAKE2 can be used to implement a form of HMAC3.

SHA-3 support has also been added at various hash sizes with the functions sha3_224(), sha3_256(), sha3_384() and sha3_512(). There’s also support for the SHAKE extendable-output function (XOF) to generate arbitrary length digests, and this is provided by the shake_128() and shake_256() functions. Although the digest length is arbitrary, the functions only offer 128- and 256-bit security against collisions and other attacks. Unlike the earlier BLAKE2 functions, the SHAKE functions allow you to select the digest length in bytes at generation time.

>>> import hashlib
>>> hasher = hashlib.shake_256()
>>> hasher.update(b"Hello, world\n")
>>> hasher.hexdigest(8)
'd1be3aba87b7379f'
>>> hasher.hexdigest(32)
'd1be3aba87b7379f90f9b3014ef68cd940ed41dd2088e878ebdc9866bba9e254'

Finally, there’s also a new scrypt() function which, unsurprisingly, implements the scrypt password-based key derivation function. This is designed to require more hardware resources than earlier approaches like PBKDF2, which makes it harder to implement in hardware and thus harder to crack. If you want to use this function, I would do some reading around it to select appropriate values for the mandatory n, r and p parameters — in particular if you choose an insufficient n then you’ll impair the security.

It’s good to see hashlib keeping up with newer algorithms, especially scrypt() given the advances in the ability of attackers to crack password hashes.

Data Structures

There are some new abstract base classes and a couple of changes to namedtuple in collections, as well as some new enum base types.

collections

There are some new abstract base classes in collections.abc to round out some gaps:

Collection
Represents containers which are a Container, Sized and Iterable2, but not indexable or reversible as a Sequence is.
Reversible
Simply an Iterable which also provides __reversed__().
AsyncGenerator
Base class for asynchronous generators, covered in an earlier article.

There are also a couple of changes to collections.namedtuple. Firstly, you can pass a module keyword parameter to set the __module__ attribute of the constructed class. Secondly, the verbose and rename parameters are now keyword-only — this change is not backwards compatible, but the risk of breakage in real code was deemed acceptably low.

Finally a bit of an obscure one, but it’s now possible to pickle a recursive collections.deque — that is, if the deque contains a reference to itself. Here’s what used to happen in Python 3.5:

>>> x = collections.deque()
>>> x.append(x)
>>> pickle.dumps(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded while pickling an object

Python 3.6, on the other hand, handles this fine:

>>> x = collections.deque()
>>> x.append(x)
>>> pickle.dumps(x)
b'\x80\x03ccollections\ndeque\nq\x00)Rq\x01h\x01a.'

enum

Back in the first Python 3.4 article we saw the addition of the enum module, with the base classes Enum and IntEnum. In Python 3.6 these have been joined by two new base classes, Flag and IntFlag.

These are similar in use to Enum and IntEnum, except that they support bitwise operations and still remain the same type. This is easiest to explain with an example — in the code snippet below you can see usage of both the original IntEnum class and the new IntFlag class.

>>> import enum
>>> class NormalIntEnum(enum.IntEnum):
...     ONE = enum.auto()
...     TWO = enum.auto()
...     THREE = enum.auto()
...
>>> class FlagIntEnum(enum.IntFlag):
...     ONE = enum.auto()
...     TWO = enum.auto()
...     THREE = enum.auto()
...
>>> NormalIntEnum.THREE.value
3
>>> FlagIntEnum.THREE.value
4
>>> NormalIntEnum.ONE | NormalIntEnum.TWO | NormalIntEnum.THREE
3
>>> FlagIntEnum.ONE | FlagIntEnum.TWO | FlagIntEnum.THREE
<FlagIntEnum.THREE|TWO|ONE: 7>
>>> (FlagIntEnum.ONE | FlagIntEnum.TWO | FlagIntEnum.THREE).value
7
>>> (FlagIntEnum.ONE | FlagIntEnum.TWO | FlagIntEnum.THREE) & 3
<FlagIntEnum.TWO|ONE: 3>

The enumeration values are automatically numbered with enum.auto(), but note that the THREE value differs between the two. That’s because in a standard enumeration, consecutive integers 1, 2, 3, 4, etc. will be used. To support bitwise operations, however, the values become bits, so the values are powers of two 1, 2, 4, 8, etc. That’s why NormalIntEnum.THREE is 3 whereas FlagIntEnum.THREE is 4.

When we apply the bitwise “or” operator | to the NormalIntEnum values, it’s successful but the result is a normal int. When we do it to the FlagIntEnum values, however, the result is another FlagIntEnum which represents the union of those values. You can also mask off values using int directly as you can see from the final line.

These new base classes are pretty handy for efficiently storing flags, and potentially to more readably represent file and network protocol formats which use bitfields for flags. The fact that they interoperate gracefullly with int values is a nice touch in particular.

Date and Time

As well as the changes for local time disambiguous we already covered in the previous article, there are some other improvements to the datetime module.

New strftime() directives

datetime.strftime() and date.strftime() now support some new directives for ISO 8601 formatted dates and times.

  • %G appears very similar to %Y in that it populates a four-digit year, but apply caution here — the year is selected as being the one that contains most of the current ISO week. This article has a great discussion of why this probably isn’t what you want unless you’re really sure.
  • %u is the ISO weekday in the range [1, 7] where 1 is Monday.
  • %V the current ISO week as a 2-digit number. ISO defines week 01 as being that which contains 4th January of that year.
datetime.isoformat() accepts timespec parameter

The isoformat() function formats the specified datetime in the ISO format, for example 2021-08-01T13:54:06. The T separator can be customised with the sep parameter, and if the datetime object has a non-zero microseconds attribute then that’s appended, e.g. 2021-08-01T13:54:06.015661. So far no change since Python 3.5.

What’s new is that in Python 3.6 there’s a new timespec parameter to enable you to truncate the string at different points. For example, hours will construct a timestamp like 2021-08-01T13. There are also minutes, seconds, milliseconds and microseconds values. The previous behaviour can be selected with auto, which is also the default, which selects between seconds and microseconds depending on whether the microseconds attribute is zero.

datetime.combine() accepts tzinfo parameter
This method is used to merge a date object and time object into a datetime. This now accepts a tzinfo parameter which can be used to override the tzinfo from the time object.

Diagnostics & Testing

The timeit module has a new autorange() method which you can use when you’re not sure how many iterations would be most appropriate to test your code. It calls timeit() with increasing numbers of iterations until the runtime exceeds 200ms.

Another improvement is that if multiple repetitions (not iterations!) have times that differ by more than a factor of four, a warning is emitted using the standard warnings module.

$ python -m timeit -s 'import time; import random' -n 1 -r 5 'time.sleep(random.random())'
1 loops, best of 5: 67.8 msec per loop
:0: UserWarning: The test results are likely unreliable. The worst
time (798 msec) was more than four times slower than the best time.

There are also a couple of improvements to unittest.mock:

assert_called() and assert_called_once() methods added to Mock
Convenience methods to assert the mock was called at least and exactly once respectively.
New keyword parameters to reset_mock()
This method still retains its purpose as clearing the call records it’s collected, so it can be reused in another part of the test. However, it now also takes two keyword-only parameters return_value and side_effect, which default to False but can be set to True to additionally reset those behaviours to their default state.

Internet

First up, some smaller changes in the email and http.client modules that don’t warrant their own sections, and then some socket-related changes which are numerous enough that they do.

The email policy framework which we covered in a previous article on Python 3.3 is no longer provisional, and the documentation has been updated to focus on it.

Also, the email.mime class all now accept a policy keyword to specify the policy to use, as does the email.generator.DecodedGenerator constructor. In addition there’s a new message_factory attribute of policies which specifies the callable used to construct new messages. For the compatability compat32 policy the default is Message, for everything else it’s the newer EmailMessage class.

The http.client has a small but useful change — HTTPConnection.request() and endheaders() now support chunked encoding of request bodies. I’ve written in the past about how it’s always been a source of frustration to me that chunked encoding of HTTP requests isn’t better supported, so this is a welcome change.

In a related change, urllib.request also supports this in the AbstractHTTPHandler class which is the base class for both HTTPHandler and HTTPSHandler. If no Content-Length header is added, and the request body isn’t a fixed-sized bytes object, then chunked encoding will be used.

socket and socketserver

The socket module has support for new option constants for use with getsockopt() on systems which support them (at least Linux but not, for example, MacOS).

SO_DOMAIN
Returns the domain of the socket as an integer, which will be something like AF_INET. This is read-only.
SO_PROTOCOL
Returns the protocol of the socket as an integer, which will be something like IPPROTO_IPV46. This is read-only.
SO_PEERSEC
Returns the security context of the socket connected to this one. I’m only vaguely familiar with this, but I believe it’s pretty esoteric and a full discussion of network access control by labels is way beyond the scope of this article. Suffice to say this is only meaningful if you’re using IPsec and/or NetLabel support, and I’m not sure how much outside of SELinux supports this stuff anyway.
SO_PASSSEC
This is specific to Unix domain sockets (AF_UNIX) and controls whether the SELinux security label of the peer socket can be received in an ancilliary message of type SCM_SECURITY. If that means nothing to you, don’t let it worry you — it’s another esoteric one that only a few people will likely care about.

There’s also a new form of setsockopt() which is apparently required in some cases. The existing options take a level parameter, often SOL_SOCKET, and an optname to specify which option. There’s then a value parameter which is either an int or a bytes object specifying a buffer in a format which is specific to each option. It seems that this isn’t sufficient for some cases, however, which require a NULL to be passed for the buffer, with the length of the buffer being used by the option in question instead of any data in the buffer. The new form supports this by allowing a None as the value and an optlen parameter which is mandatory if None is used.

Here’s an example of this from the standard library:

algo.setsockopt(socket.SOL_ALG, socket.ALG_SET_AEAD_AUTHSIZE, None, taglen)

Another enhancement to socket is support for the AF_ALG family for interfacing with the Linux Kernel crypto API. Needless to say, this is only available on Linux. The SOL_ALG and ALG_* constants were added to socket, as well as sendmsg_afalg() which is a specialised form of sendmsg() which sets various parameters of an AF_ALG socket.

Sticking with Linux-specific changes, there are two more additional constants added. These are both used with level as socket.IPPROTO_TCP.

TCP_USER_TIMEOUT
Specified with a non-negative integer argument, this specifies the maximum amount of time in milliseconds that transmitted data may remain unacknowledged, or buffered data may remain untransmitted, before the connection is forcibly closed and ETIMEDOUT is returned. Specifying zero will use the system default. Setting a low value is helpful where “fail fast” behaviour is desirable, or large values are useful where connections should persist even in the presence of extended periods of disconnection.
TCP_CONGESTION
Specified with a string parameter, this sets the TCP congestion control algorithm to use on the connection. For unprivileged processes this must be one of the set allowed in tcp_allowed_congestion_control under /proc, but privileged processes with the CAP_NET_ADMIN capability can use any setting. On Linux, to see what’s available (not just allowed) take a look at tcp_available_congestion_control under /proc.

As well as socket changes, there are also a couple of changes in the higher-level socketserver module.

First up, classes based on socketserver now support the context-manager protocol, to make it easier to ensure socketserver.server_close() is called:

with socketserver.TCPServer(("", 1234), MyHandler) as server:
    server.serve_forever()

Secondly, the wfile attribute of StreamRequestHandler has been updated to implement the io.BufferedIOBase interface. Prior to this change, it was a simple wrapper around an underlying filehandle and therefore could perform partial writes, which library code was often ill equipped to handle. With this change, a write() call will use the sendall() method on the socket, so blocking until all data has been successfully written.

ssl

The ssl module has also received some more improvements this release to improve security and add support for OpenSSL 1.1.0. The insecure Tiple DES cipher has been dropped, and the ChaCha20 cipher and Poly1305 MAC have been added, which was standardised in RFC 8439.

The SSLContext class now has some more secure defaults, which used to be overidden in ssl.create_default_context() instead. PROTOCOL_TLS, added in this release, is now the default and it always selects the highest version supported by both client and server. Additionally, SSLv2 and v3 are explicitly disabled by default due to their insecurity. Only cipher suites that OpenSSL classifies as HIGH encryption are included, with no MD5-based ciphers. There are some other more secure default settings which are a bit more involved so I’ll pass over them here.

As well as the new PROTOCOL_TLS, there are also two more specific protocols that can be selected, PROTOCOL_TLS_CLIENT and PROTOCOL_TLS_SERVER, which are only suitable for use with the client or server end of the connection respectively. The reason for the difference is that the sensible and secure defaults differ between these two ends. For example, PROTOCOL_TLS_CLIENT enables CERT_REQUIRED and check_hostname by default.

To see which ciphers are in use, the SSLContext class also has a new get_ciphers() method which returns details of all the ciphers that are available and enabled, in priority order.

>>> pprint.pprint(context.get_ciphers()[0])
{'aead': True,
 'alg_bits': 256,
 'auth': 'auth-any',
 'description': 'TLS_AES_256_GCM_SHA384  TLSv1.3 Kx=any      Au=any  '
                'Enc=AESGCM(256) Mac=AEAD',
 'digest': None,
 'id': 50336514,
 'kea': 'kx-any',
 'name': 'TLS_AES_256_GCM_SHA384',
 'protocol': 'TLSv1.3',
 'strength_bits': 256,
 'symmetric': 'aes-256-gcm'}
>>> print("\n".join(i["name"] for i in context.get_ciphers()))
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_AES_128_GCM_SHA256
ECDHE-ECDSA-AES256-GCM-SHA384
# ... lots of entries skipped for brevity ...
CAMELLIA128-SHA256
CAMELLIA256-SHA
CAMELLIA128-SHA

There’s a new SSLSession object which has been added to support copying of an existing session from one client-side connection to another. This allows TLS session resumption which improves performance. It can be even more important for repeated connections to FTP servers as RFC 4217 §10.2 explicitly mentions the possibility that FTP servers may refuse to waste CPU cycles by repeatedly performing TLS negotiation with the same client, returning a 522 error code instead.

The session is exposed as the session attribute of the SSLSocket object, so you can retrieve it from your first connection pass it using a new session parameter to the SSLContext.wrap_socket() method for your second and subsequent connections. See the simplified example below:

>>> import socket
>>> import ssl
>>>
>>> context = ssl.create_default_context()
>>> with socket.create_connection(("www.andy-pearce.com", 443)) as sock:
...     with context.wrap_socket(sock,
...                              server_hostname="www.andy-pearce.com
...                             ) as ssl_sock:
...         session = ssl_sock.session
...         ssl_sock.send(b"GET https://www.andy-pearce.com/blog/ HTTP/1.1\r\n"
...                       b"Host: andy-pearce.com\r\n\r\n")
...         print(ssl_sock.read(8192).splitlines()[0])
...
73
b'HTTP/1.1 200 OK'
>>> session
<_ssl.Session object at 0x10b12b940>
>>>
>>> with socket.create_connection(("www.andy-pearce.com", 443)) as sock:
...     with context.wrap_socket(sock,
...                              server_hostname="www.andy-pearce.com",
...                              session=session
...                             ) as ssl_sock:
...         ssl_sock.send(b"GET https://www.andy-pearce.com/blog/ HTTP/1.1\r\n"
...                       b"Host: andy-pearce.com\r\n\r\n")
...         print(ssl_sock.read(8192).splitlines()[0])
...
73
b'HTTP/1.1 200 OK'

It’s a bit of a shame that the library can’t implement session reuse automatically behind the scenes, but I suppose there are probably cases where that would break things. In any case, at least applications have the option now.

Language Support

There’s a new contextlib.AbstractContextManager abstract base class for context managers — that is, objects which implement __enter__() and __exit__() methods. For simple cases, the __enter__() method in the base class returns self, while __exit__() is left abstract for subclasses to implement. There’s also a corresponding typing.ContextManager class for type hints.

On the subject of typing there are a few more improvements there:

Better support for generic type aliases
These are now allowed to be mixed with concrete types in arbitrarily complex ways (e.g. Dict[str, Tuple[S, T]]).
typing.Collection added
The mirror of collections.abc.Collection.
typing.TYPE_CHECKING added
Evaluates to True when type checking, but False at runtime. This allows conditional behaviours which are only required for type-checking, such as importing additional modules.
typing.NewType() added
A way to create lightweight type aliases which are considered distinct types. For example, JSON = typing.NewType("JSON", str) creates a new type JSON which is considered a subclass of str but is treated as a distinct type. With appropriate annotations, this means the type checker can detect you passing the result of a function that returns (say) XML into a function that expects JSON, even though at runtime it’s all just str.

Mathematics

The decimal.Decimal() class has a potentially useful as_integer_ratio() method which converts the decimal into the simplest fraction with the same value. The result is returned as a 2-tuple of (numerator, denominator).

The constant math.tau has been added. This is simply defined as , but this is a more useful contstant in many situations. You can read PEP 628 for more details4.

The random module has also acquired a new choices() function as a more general form of the existing choice(). This version selects a specified number of items from the population, with replacement5. There’s also an optional weights parameter if you want a biased choice.

>>> import random
>>> random.choices(range(10), weights=range(10, 0, -1), k=10)
[3, 4, 5, 6, 0, 4, 2, 5, 4, 6]

Outside of purely statistical analysis, I can see this being useful in applications such as load balancing, where you want to direct a set of requests to servers balanced according to their current available capacity.

The statstics module, added back in Python 3.4, has acquired a new harmonic_mean() method which, unsurprisingly, calculates the harmonic mean of a set of data points. I seem to recall that this mean is often the most appropriate when calculating the average of a set of rates or ratios, but it’s been a very long time since A-Level maths so I’d suggest turning elsewhere to learn about the Pythagorean means.

Operating System

The iterator os.scandir() now offers a close() method to make sure associated resources are freed in the case that iteration wasn’t completed for some reason. More usefully, it also now supports the context manager protocol to make sure this cleanup happens. If you don’t either exhaust the iterator or call close(), explicitly or implicitly via the context manager, its destructor will raise ResourceWarning to let you know.

>>> import os
>>> import warnings
>>> warnings.resetwarnings()
>>> x = os.scandir("/tmp")
>>> del x
__main__:1: ResourceWarning: unclosed scandir iterator <posix.ScandirIterator object at 0x105042b70>

You can see an example of using scandir() as a context manager in the small script below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import os
import pathlib

base_dir = pathlib.Path.home()
with os.scandir(base_dir) as dir_entries:
    for entry in dir_entries:
        if entry.name.startswith("runcount-"):
            new_count = int(entry.name.split("-", 1)[1]) + 1
            os.remove(entry.path)
            break
    else:
        new_count = 1

with open(base_dir.joinpath(f"runcount-{new_count}"), "w") as fd:
    fd.write(str(new_count) + "\n")

There’s also an improvement to the shlex module, which provides simple shell-like parsing facilities for strings. This takes the form of a new punctuation_chars parameter to shlex.shlex() which defaults to False for pre-3.6 behaviour, but can be set to a string of characters to be regarded as punctuation meaningful to the shell. If specified, a consecutive block with any of these characters is treated as a single token. Alternatively, passing a value of True causes it to use an internal default of common shell punctuation.

>>> import shlex
>>> command_string = "/opt/bin/app_status && /opt/bin/app_shutdown >$LOGFILE 2>&1"
>>> list(shlex.shlex(command_string))
['/', 'opt', '/', 'bin', '/', 'app_status', '&', '&', '/', 'opt', '/', 'bin', '/', 'app_shutdown', '>', '$', 'LOGFILE', '2', '>', '&', '1']
>>> list(shlex.shlex(command_string, punctuation_chars=True))
['/opt/bin/app_status', '&&', '/opt/bin/app_shutdown', '>', '$', 'LOGFILE', '2', '>&', '1']
>>> list(shlex.shlex(command_string, punctuation_chars="&>12"))
['/opt/bin/app_status', '&&', '/opt/bin/app_shutdown', '>', '$', 'LOGFILE', '2>&1']

This falls short of full shell parsing, but comes pretty close for simple cases. The collapsing of paths into a single token is particularly helpful, although unfortunately this doesn’t extend to proper support for backslashed spaces.

>>> list(shlex.shlex(r"/opt/bin/app\ status", punctuation_chars=True))
['/opt/bin/app', '\\', 'status']

A small change to the subprocess module is that it now emits a ResourceWarning if the child process is still running when the Popen object destructor is called. You shouldn’t ever see this if you use it as a context manager or explicitly call wait(), but it’s a useful hint for more complicated cases. Also, there’s a new encoding parameter to specify the encoding to use for stdin, stdout and stderr, if used.

Text Parsing

The regular expression language supported by the re module has once again been extended by allowing flags to be specified inline for a particular subset of a pattern. For example, in Python 3.5 the re.IGNORECASE or re.I flags can be passed via the flags parameter to several of the functions (e.g. re.compile()) or the sequence (?i) can be specified within the pattern itself — both of these have the effect of specifying case-insensitive matching across the entire pattern. The addition within Python 3.6 is that you can specify (?i:subpattern) to ignore case only for subpattern, without affecting the flag for the remainder of the pattern. You can also specify (?-i:subpattern) to add case-sensitivity to an otherwise case-insensitive pattern.

>>> import re
>>> pattern = re.compile(r"(?i)IT(?-i:aa)S")
>>> pattern.match("itaas") is None
False
>>> pattern.match("ITaas") is None
False
>>> pattern.match("ITAAS") is None
True

Another handy change is not to the regular expression language, but to the Match objects returned on a successful match. These now support indexing as an alias for calling group() — examples in the code snippet below. Rather more esoterically, now any object that supports an __index__() method can be passed instead of just an int.

>>> import re
>>> pattern = re.compile(r"(?i)my name is ([a-z]*)")
>>> match = pattern.match("My name is Andy")
>>> match[0]
'My name is Andy'
>>> match[1]
'Andy'
>>> pattern = re.compile(r"(?i)my name is (?P<name>[a-z]*)")
>>> match = pattern.match("My name is Guido")
>>> match["name"]
'Guido'

Deprecations

Finally, the following things have been deprecated in this release, and will be removed in a future release.

  • asynchat and asyncore deprecated in favour of asyncio.
  • In distutils, the extra_path parameter to the Distribution constructor is considered deprecated.
  • In grp.getgrgid(), non-int arguments are no longer accepted.
  • Support for bytes-like objects as paths in the os module, which was never documented behaviour, is now explicitly deprecated.
  • Version 1.0.1 and earlier of OpenSSL are no longer supported.
  • SSL parameters like certfile, keyfile and check_hostname which used to be specified directly in modules such as ftplib, http.client, imaplib, poplib and smtplib, are now deprecated in favour of using an SSLContext.

Conclusions

As usual some useful changes. The asyncio updates are welcome, and it’s not surprising that the interface to this fairly recently-added module is still evolving more than many others.

There’s also some great security-related enhancements, including the new algorithms supported by hashlib, especially scrypt, and the more secure defaults used by the ssl module.

So that’s it for Python 3.6. In the next article I’ll continue my upward climb by looking at major new features added in Python 3.7.


  1. At least as far as I remember, which frankly isn’t ever as far as I’d like. As you may have noticed I tend to cover things in the order they’re mentioned in the release notes, to try and avoid duplication, but if they’re mentioned twice in there then there’s a fair change I’ll cover them twice here as well, despite my best efforts! 

  2. In other words, they support the __contains__(), __iter__() and __len__() methods. 

  3. I’m not in a position to comment on the relative security of using BLAKE2’s keyed mode directly, as opposed to the more established hmac module based on RFC 2104. As a hashing algorithm, however, BLAKE2 can be used with hmac if you wish, although using it directly will almost certainly be noticeably faster. 

  4. Although as you might imagine, there’s a limit to how much you can say about this change. With the exception of PEP 20, this has got to be a good candidate for the shortest PEP ever! 

  5. The existing random.sample() already offers selection without replacement. 

  6. I’m not sure if this will work for anything except AF_INET or AF_INET6, because the protocol constants aren’t necessarily defined. This will be OS-dependent, however. 

This is the 14th of the 15 articles that currently make up the “Python 2to3” series.

22 Aug 2021 at 12:56AM in Software
 |   | 
Photo by David Clode on Unsplash