In this series looking at features introduced by every version of Python 3, we begin to look at version 3.7 by seeing what major new features were added. These include several improvements to type annotations, some behaviour to cope with imperfect locale information, and a number of diagnostic improvements.
This is the 15th of the 31 articles that currently make up the “Python 3 Releases” series.
Python 3.6 well and truly dealt with, it’s time to skip foward and see what gems Python 3.7 has in store for us. It was released in June 2018, cleaving closely to the 18 month release schedule that’s been standard until recently.
On the face of it, there don’t seem to be any massive changes in this release, but a number of potentially useful ones, including:
DeprecationWarning is handled to allow them to be seen by their target audience.
There are also some new modules:
async code instead of threads.
There are also a lot of improvements to the
asyncio module and many others, so let’s jump in and look at these changes in more detail. In this article we’ll look at the improvements to the language and interpreter, and in the next article we’ll cover the new and improved library modules.
There are two PEPs in this release which solve various problems and improve performance of type hints.
Type hints continue to evolve in this release with the advent of PEP 563 which attempts to resolve two issues:
The solution to both is to defer the evaluation of annotations, instead storing them in the AST1 as strings. This improves startup time, because the definitions dont need to be evaluated; and it also allows forward-references, because by the time the annotations are evaluated, all the definitions are already available.
1 2 3 4 5 6 7 8 9 10 11 12 13
You’ll notice from the first line that to enable this feature currently requires a an import from
__future__, as this is a potentially backwards-incompatible change. It’s intended to become default behaviour in Python 3.10.
If the annotations are required at runtime for some reason,
typing.get_type_hints() can be used to recover the string form, and if the evaluated form is required then a regular
eval() will yield that. There are some consequences to bear in mind, however, such as the fact that the annotation is no longer being evaluated in the same scope where it’s being defined — therefore annotations which refer to any local state are going to cause problems.
It’s also worth noting this only applies within annotations themselves — other uses, such as declaring type variables or alises will still need to use the string form, because those are still evaluated immediately.
There are also some additional improvements in PEP 560 which are a little less straightforward, but also aim to improve the performance of code using
A big source of slowness is that the
GenericMeta class, the metaclass of all generic classes, has quite a slow
__new__() method — the
GenericMeta has been removed in 3.7 and replaced with the aid of some new core language features. An additional source of slowness is that method resolution orders (MROs) for generic classes is quite long, due to the mirroring of
collections.abc types in
typing, and there’s also a change to address this.
The first change is that there’s a new
__class_getitem__() method which is always treated as a class method, even without the
@classmethod decorator. This is similar to
__getitem__() except that it’s called on a class rather than its instances. So when a generic class is declared like
__class_getitem__() is invoked with
int as the parameter. As an aside, this won’t interfere with any
__getitem__() method, since on instances that method is preferred.
The second change is another new method
__mro_entries__() which classes can define to modify the MRO. For any object which isn’t a class object that appears in the base classes of a class, any
__mro_entries__() method on it will be called. It will be passed the original
tuple of base classes, and is expected to return a new one which will be used by all further processing. The original tuple is preserved as the
__orig_bases__ class member.
To illustrate this, I’m going to present the exact code snippet used in the PEP, and then put some of my own explanatory notes below to help anyone that’s struggling to understand it — you’re not alone if so, this took me a few minutes of head-scratching!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Taking the declaration of
Tokens on line 12, the first thing Python does is process the list of base classes, in this case only one base which is
NewList has a
__class_getitem__() method, this is called with parameter
int, and returns
GenericAlias(NewList, int) on line 10. This object is then used as the base of
Tokens, but since it isn’t a class object and it has an
__mro_entries__() method then that is called next to obtain a new base class. This yields a 1-tuple
(NewList,) as returned on line 6.
So now the base class of
Tokens shows as
NewList, as verified on line 15. The assertion on line 16 demonstrates that the original list of bases has been stored under
__orig_bases__. Finally, line 17 validates that the MRO is consistent with the new base class list, and it is: methods are first searched on
NewList and finally the ultimate base class
That got a little deep a little quickly, so for most people I think the main take-away here is that there’s active work being done to ensure the overheads of using type annotations are minimal. Coming from a background in C and C++, I can attest to the value of compile-time type-checking, so I’d strongly urge everyone to try and adopt these practices where they can.
There are also some changes coming up in later Python relases to make things even better, but we’ll get there in a few weeks or so.
Locales are tricky beasts to get right sometimes. A lot of users don’t really properly understand how to set them in their environment, and even those that do often still find certain aspects confusing, and understandably so.
One of the most problematic issues is the the POSIX locale, also known as the “C” locale. If you’ve messed with your locale settings you’ve probably seen the
LC_CTYPE=C environment variable on at least one system. The main issue here is that this locale implies use of ASCII encoding, which renders applications unable to handle any international characters in filenames and the like.
In Python 3.7 there are two closely related PEPs which attempt to address this. First up is PEP 538, which specifies some behaviour that’s triggered when the POSIX locale is set. If the
C locale is set, or no locale is set, then Python will now attempt to coerce this into a UTF-8-capable locale — the first of
UTF-8 which is available will be used. The locale is coerced by setting the
LC_CTYPE to the selected locale, so other calls within the script will return the correct locale. If necessary, this behaviour can be disabled by setting
Secondly, Python 3.7 adds a new UTF-8 mode as per PEP 540. If enabled, this forces Python to ignore the currently set locale entirely, and instead use UTF-8 everywhere as the default encoding. The rationale is that UTF-8 works more or less everywhere on modern platforms, and that simple default may be more likely to be correct than assuming the user has properly configured their locale.
This mode is activated whenever the POSIX Locale is set (the “C” locale), but this will only trigger if the locale coercion described above fails to find a matching locale, or is disabled. The mode can also be activated by passing the
-X utf8 command-line option, or by setting environment variable
PYTHONUTF8=1. It can also be explicitly disabled with
-X utf8=0 or
This has the advantage of supporting UTF-8 regardless of the locales available on the current machine. It has the disadvantage, however, that its changes won’t extend to extension modules, or to non-Python2 child processes.
Enabling UTF-8 mode has the following direct effects:
sys.getfilesystemencoding() always returns
locale.getpreferredencoding() always returns
sys.stdout error handler is set to
surrogateescape, which replaces invalid bytes with reserved sequences on decode, and converts them back to the appropriate bytes on encode.
And those changes have the following impact:
open() uses UTF-8 by default, although it uses the
strict error handler3.
os.fsencode(), for encoding/decoding filesystem paths, use UTF-8.
If all this seems a little fiddly, that’s because it is. I’m sure it doesn’t help that the two PEPs were developed more or less in parallel, but I can see why they’re both kept — the advantages of supporting UTF-8 consistently are significant enough to justify some belt-and-braces coverage. The main conclusion for me is that life’s easier if everyone just makes sure their locale is set properly to something sensible, and we don’t have to resort to hacks like this.
Python has long supported subsecond timestamps by the simple expedient of returning
float instead of
time.time() and similar functions. However, the precision of a
float has its limits, and if you want to support nanosecond resolution then a standard 64-bit floating point value will start to drop precision at intervals longer 104 days.
To cater for these cases, some new nanosecond-supporting functions have been added to the
time module as per PEP 564. Instead of returning seconds as a
float, these return nanoseconds as an
int to avoid precision loss. The new functions are:
These are all equivalent to the same-named functions without the
This release also has a few changes primarily of interest whilst developing and testing your scripts.
In Python 3.2, changes were made so that
DeprecationWarning wasn’t shown by default, so that warnings from things like development tools also written in Python wouldn’t be shown to users unless they specifically opted to see them. Unfortunately this also had the effect of significantly limiting the primary purpose of these warnings, to provide developers with advance warning of future breaking changes in APIs.
In Python 3.7, therefore, these warnings have been enabled by default once more, but only for code invoked directly by
__main__. This means that authors of scripts will see warnings for functions they’re using directly, but usages which occur within libraries will be hidden by default on the basis that the library maintainers should be dealing with them. These should be discovered when running unit tests.
To help library maintainers select appropriate warnings for their code, here’s a rundown of the updated visibility of the different warnings:
__main__ and when running unit tests. This is used for usages which will continue to work, but which are deprecated and will be removed in future.
There’s a new development mode which adds various additional runtime checks which are too expensive to be enabled by default. This is intended to be used during development and testing, but generally not when deployed on production systems. The mode can be enabled by passing
-X dev on the command-line, or by setting the
PYTHONDEVMODE environment variable to some non-empty string.
This is primarily a convenience option which enables some pre-existing settings more appropriate for a development environment, which are as follows:
PYTHONMALLOC=debug which I covered in a previous article on Python 3.6.
PYTHONASYNCIODEBUG=1 which enables debug mode in the
asyncio module, which performs additional checks such as finding coroutines which were not awaited, detecting non-threadsafe APIs called from the wrong thread, and raising warnings if some operations take too long.
-W default which sets the
default warning filter, which shows
-X faulthandler which I covered in a previous article on Python 3.3.
dev_mode attribute of
sys.flags is also set to
True in this mode, to enable application code to make some of its own checks conditional on it. There are also some additional checks added to this mode in future Python versions, but I’m trying hard not to violate causality too much in this series of articles so I’ll defer discussion of those for now.
tracemalloc, which I discussed in a previous article on Python 3.4, is not enabled in development mode as the potential performance and memory overheads it incurs are regarded as still too great. Where possible, enabling this in addition can supply more information on the source of issues, such as the location in the source code where memory was first allocated.
breakpoint() function offers a standard and concise way to jump into the debugger. This is primarily a convenient shorthand for
import pdb; pdb.set_trace(), but also allows support for other debuggers by indirecting through
To change the hook you can assign to
sys.breakpointhook() directly to install a new hook, or set the
PYTHONBREAKPOINT environment variable. The original hook function is always available as
sys.__breakpointhook__ if you need to reinstall it.
For more discussion, see PEP 553.
There’s another new diagnostic option
-X importtime, which can also be enabled by setting the
PYTHONPROFILEIMPORTTIME environment variable. This profiles how long
import statements take to execute, broken down recursively. The time for each individual import, as well as the cumulative time for each module including its dependencies, is shown.
By way of example, here’s an excerpy of the timings of just running
import subprocess — the full chain is quite long, so I’ve removed a chunk in the middle.
$ python -X importtime -c 'import subprocess'
import time: self [us] | cumulative | imported package
import time: 115 | 115 | zipimport
import time: 572 | 572 | _frozen_importlib_external
import time: 60 | 60 | _codecs
import time: 518 | 577 | codecs
import time: 534 | 534 | encodings.aliases
import time: 946 | 2056 | encodings
import time: 256 | 256 | encodings.utf_8
import time: 102 | 102 | _signal
import time: 444 | 444 | encodings.latin_1
import time: 76 | 76 | _abc
import time: 418 | 494 | abc
import time: 470 | 963 | io
import time: 66 | 66 | _sre
import time: 367 | 367 | sre_constants
import time: 30580 | 30946 | sre_parse
import time: 346 | 31357 | sre_compile
import time: 82 | 82 | _locale
import time: 334 | 334 | copyreg
import time: 596 | 32367 | re
import time: 269 | 269 | token
import time: 1109 | 33745 | tokenize
import time: 238 | 33982 | linecache
import time: 352 | 34334 | traceback
import time: 304 | 304 | _weakrefset
import time: 647 | 35284 | threading
import time: 818 | 52472 | subprocess
It’s worth noting that the string of imports at the beginning are triggered regardless of your script — you can easily demonstrate this to yourself by running
python -X importtime -c 'pass'. At the risk of stating the obvious, also note that the imports are listed in reverse order — dependencies are shown before those that import them. For example, you can see above that the cumulative time for
encodings includes the time taken to import
codecs, which in turns includes
The other point of interest is to look at the massive impact of importing
re, adding a whopping 32 ms to your startup time! Well OK, of course I fully understand this probably isn’t a big deal in almost any context where you’ve chosen Python as an appropriate language — you can take the software engineer out of embedded development, but you can’t take the reflexive distaste for wasting saving clock cycles out of the software engineer.
A selection of the small changes are briefly discussed below.
int as a key isn’t compatible with POSIX, which allows platforms to define any type they like. As it happens it’s usually something castable to
int, which is why it hasn’t been a major issue, but apparently at least Cygwin uses something incompatible. The gory details can be found in PEP 539.
__getattr__(), it’s now used in case a module-level attribute lookup fails, in the same way as defining the same method on an object. Additionally defining
__dir__() customises the result of
dir() as well. This can be useful to do things like support deprecated aliases for attributes, or support lazy loading of some features when specific attributes are accessed. See PEP 562 for more details.
.pyc Files By Hash
.pyc file is up to date, Python has always used the last modified timestamp of the source file, which is embedded in the
.pyc file. This works fine in most cases where the files are generated ad hoc, but distributors often want to pre-compile and distribute
.pyc files because the directories into which they’re installed globally are not typically world-writable, so unprivileged users lose the advantages of pre-compilation. Enter PEP 552 which allows the timestamp to be replaced with a hash — this makes the content of the
.pyc files deterministic and repeatable, even across multiple platforms, which is helpful for distributors. The timestamp method is still used by default, but the hash method can be selected when using the
fromhex() Now Ignores all ASCII whitespace
bytearray objects, would only ignore specifically ASCII space characters.
isascii() Method Added
bytearray this returns
True only if every character is valid ASCII.
There’s lots of smaller things to like here, and I’m particularly glad that the troublesome issue of forward-references in type hints has been addressed, even if it requires a
__future__ import before Python 3.10. The performance improvements for
typing are more transparent to the programmer, but hopefully remove some of the hesitancy people may have had about embracing type checking in Python.
The UTF-8 mode and
CTYPE coercion are potentially handy, and I understand why they’ve been added, but I can’t help but wonder if this is just enabling people who should really be sorting out locales on their systems. But it’s probably not the Python community’s job to worry about that sort of thing, so this seems like a pragmatic approach for now.
The improved facilities for developing and testing code are certainly welcome, and hopefully will help people catch problems more readily during unit testing which previously may have crept out to be found in production under heavy load later.
So that wraps up this article — in the next one we’ll be looking at the new and improved library modules in Python 3.7.
Abstract syntax trees or ASTs are an intermediate form which offers a normalised view of source code, which is then turned into bytecode. ↩
Or child processes which are running in an older version of the Python interpreter. ↩
The justification for this is that files are expected to be encoded properly in a known encoding when generated, whereas
stdout are more likely to contain incorrectly encoded characters, as they’re coming from other tools or user input which wasn’t necessarily generated with the current usage in mind. ↩