☑ What’s New in Python 3.11 - Type Hint Improvements

3 Jan 2023 at 4:20AM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we continue our look at the new Python 3.11 release, taking a look at new language features around type hints.

This is the 26th of the 28 articles that currently make up the “Python 3 Releases” series.

python 311

Having looked at Python 3.11’s performance improvements and new exception support features in previous articles, now we turn our attention to some enhancements to type hinting support in this release.

There are a series of changes in this area, which I’ve summarised below.

  • Support for variadic type variables with the addition of TypeVarTuple.
  • Marking only some keys as optional in TypedDict.
  • Referring to a class’s own type using the new Self type annotation.
  • Requiring strings composed from literals, not dynamically, using LiteralString.
  • Type annotation dataclass_transform for decorators which preserve dataclass semantics.
  • The future of from __future__ import annotations is uncertain.

As usual, we’ll look at all of these in more detail in the follow sections.

Variadic Generics

Since type hinting was first formalised in Python 3.5, the typing module has contained TypeVar to represent type variables. This type has some flexibility to support multiple types, but it only ever represents a single type at a time. With a language like Python which supports fully heterogeneous containers, however, that can be quite an onerous limitation. As a result, Python 3.11 brings us typing.TypeVarTuple, which is equivalent to TypeVar except that it can represent a tuple of arbitrary types.

I’ll try and drill in to quite a bit here, but there’s a lot of detailed analysis in PEP 646 so if you really want the nitty gritty details, go check it out.

Example

This is perhaps best illustrated with a code snippet. Here we see a function which returns whatever tuple it’s given but with the first element converted to an int.

1
2
3
4
5
6
7
8
from typing import TypeVar, TypeVarTuple

Ts = TypeVarTuple("Ts")

def convert_first_int(values: tuple[int|str|float, *Ts]) -> tuple[int, *Ts]:
    return (int(values[0]), *values[1:])

print(repr(convert_first_int(("1", "2", "3"))))

There’s a bit to unpack in the example above, pun intended with apologies, so let’s go through it line by line.

After the imports, on line 3 we declare a type variable using the new TypeVarTuple — this represents an arbitrary tuple of potentially different types. We need this because our function only deals with the first item of the tuple, and so we need a way to express that we allow any remaining types, and whatever they are in the input they will also be the same in the output.

Now we get to the signature of convert_first_int() on line 5. The values parameter is declared with type tuple[int|str|float, *Ts]. That first type hint of int|str|float is normal enough, it means that first item must be one of int, str or float — the use of the | operator is an application of the more concise format for typing.Union that was added in Python 3.10.

The second clause *Ts is a new use of the * operator to “unpack” the tuple of types represented by the type variable Ts. Thus, the whole specification tuple[int|str|float, *Ts] represents a tuple where the first type is one of the three listed, and the remaining types can be anything. The return type of tuple[int, *Ts] therefore indiates that a tuple will be returned where the first item is always an int, and the remaining types will be whatever they were in the original values.

Limitations

So TypeVarTuple is just like Tuple[T1, T2, ...] where the number of types is arbitrary. An important point to note, however, is that it must always be unpacked with the * operator when used — it would not be valid to use something like values: Ts or values: tuple[Ts] in the example above.

Since this use of * requires a grammar change, in earlier versions of Python it’s not available — the TypeVarTuple type is available via the typing_extensions backport package, but without this grammar change it’s not particularly useful. As a result, there’s also typing.Unpack which has the same effect. So instead of tuple[*Ts] you’d write tuple[Unpack[Ts]] — but I wouldn’t bother in Python 3.11, the asterisk syntax is more concise and clearly what the PEP authors intended people to use where possible.

Also it’s worth noting that, unlike TypeVar, TypeVarTuple doesn’t yet support constrained types or the keyword parameters like bound or covariant. These are likely to come in a future PEP and release, but things have been kept simple for now.

Another important point is that every ocurrence of the same TypeVarTuple variable within a given context must refer to the same type. For example, the following code would not be valid:

Ts = TypeVarTuple("Ts")
def my_function(arg1: tuple[*Ts], arg2: tuple[*Ts]) -> None:
    ...

my_function((1, 2), ("3", "4"))  # NOT valid

Finally, only one unpacking is allowed in a given tuple.

Xs = TypeVarTuple("Xs")
Ys = TypeVarTuple("Ys")

a: tuple[*Xs]       # Valid
b: tuple[int, *Ys]  # Valid
c: tuple[*Xs, *Ys]  # NOT valid

Unpacking With *args

The final aspect of TypeVarTuple I’d like to cover is its use with *args. According to PEP 484 if *args is annotated with a type then all the arguments are expected to have the same type.

def my_function(*args: int) -> None:
    ...

my_function(1, 2, 3)    # Valid
my_function(1, 2, "3")  # NOT valid

With TypeVarTuple, however, we can now properly annotate heterogeneous type specifications. Unlike the other examples above, we don’t need to unpack within a tuple because *args is already a tuple — this is the only instance where the type variable like *Ts can be used directly, as opposed to parameterise something else (e.g. tuple[*Ts]).

Ts = TypeVarTuple("Ts")

def my_function(*args: *Ts) -> None:
    ...

my_function(1, 2, "3")  # Inferred as tuple[int, int, str]

This extension of the use of * doesn’t just apply to TypeVarTuple either — other tuple type specifications can be used. Take this code, for example.

def my_function(*args: *tuple[int, *tuple[str, ...], int]) -> None:
    ...

This annotation expects my_function() to be called with a single int, followed by zero or more str values and then a final int at the end. Of course, at runtime they’ll all be passed in args as normal.

Partially Optional Keys in TypedDict

Python 3.8 introduced the TypedDict class as a way to add type hints to specific keys within a dictionary. As described in PEP 589 the keys can either be all required, which is the default, or made all optional by passing total=False to the base class1, as in the snippet below.

class AllOptional(TypedDict, total=False):
    one: int
    two: str
    three: float

In Python 3.11 this has been further enhanced, as per PEP 655, to allow some fields to be marked as optional whilst allowing others to be required. This has been achieved by adding two new identifiers to the typing module which are Required and NotRequired — these might seem a slightly convoluted choice, but given that Optional already means “a type or None, different nomenclature was needed. The reason both are required is to cater for cases where total=True as well as total=False.

Example

The snippet below contains two class definitions which are equivalent in their notions of which fields are required and which are optional.

from typing import NotRequired, Required, TypedDict

class First(TypedDict):
    one: int
    two: str|None
    three: NotRequired[float]

class Second(TypedDict, total=False):
    one: Required[int]
    two: Required[str|None]
    three: float

This is all fairly straightforward, but one point to note is the use of str|None instead of the more usual Optional[str]. This is recommended by the PEP because of the understandable confusion should one write Required[Optional[str]], though it would still be both syntactically and semantically correct.

Let’s see an example of mypy identifying a violation of these rules. Firstly, here’s the code.

typeddict.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from typing import NotRequired, Required, TypedDict

class MyStructure(TypedDict):
    one: int
    two: NotRequired[int]
    three: NotRequired[str]

def my_function(arg: MyStructure) -> None:
    print(f"{arg['one']=}")
    if "two" in arg:
        print(f"{arg['two']=}")
    if "three" in arg:
        print(f"{arg['three']=}")

my_function({"one": 123, "two": 456})
my_function({"three": 789})

And here’s the result of running mypy over it.

$ mypy --python-version 3.11 typeddict.py
typeddict.py:16: error: Missing key "one" for TypedDict "MyStructure"  [typeddict-item]
Found 1 error in 1 file (checked 1 source file)

Interaction With get_type_hints()

One final point is that these hints are filtered out of the results of typing.get_type_hints(), unless you specify include_extras=True in the call.

>>> from typing import NotRequired, Required, TypedDict, get_type_hints
>>>
>>> class First(TypedDict):
...     one: int
...     two: str|None
...     three: NotRequired[float]
...
>>> get_type_hints(First)
{'one': <class 'int'>, 'two': str | None, 'three': <class 'float'>}
>>> get_type_hints(First, include_extras=True)
{'one': <class 'int'>, 'two': str | None, 'three': typing.NotRequired[float]}

Self Type Annotation

It’s quite often the case one has to refer to the current class in the signature of a method, often when a method needs to return a new instance of a class. This can sometimes be the case for normal methods, and also for class methods which act as alternative constructors. As of Python 3.11, this has been made more convenient and intuitive by adding typing.Self, whose use I’ll illustrate below.

Instance Methods

Consider adding type annotations to a normal method which must return an instance of the class. To see how the new typing.Self helps us, let’s first see how to do this without it.

A simple approach to this is just to annotate with the name of the class itself, as in the example below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from __future__ import annotations

from datetime import date
from typing import NamedTuple

class Person(NamedTuple):
    first_name: str
    last_name: str
    date_of_birth: date

    def replace_last_name(self: Person, new_name: str) -> Person:
        return self.__class__(self.first_name, new_name, self.date_of_birth)

This works, but has two issues. Firstly it only works using from __future__ import annotations, whose future is somewhat in doubt (as discussed later in this article) — you could work around that by using a string literal instead, though. Secondly, if Person is subclassed then this method is still annotated as returning the base class, which is going to cause type checkers some problems.

An option which is better in some ways is to use a TypeVar which is bound to the class, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from datetime import date
from typing import NamedTuple, TypeVar

PersonType = TypeVar("PersonType", bound="Person")

class Person(NamedTuple):
    first_name: str
    last_name: str
    date_of_birth: date

    def replace_last_name(self: PersonType, new_name: str) -> PersonType:
        return self.__class__(self.first_name, new_name, self.date_of_birth)

This works, but it’s fiddly — you need to remember to bind the TypeVar, and you also need to annotate self which isn’t normally done, hence is easy to forget.

Now we can look at the new annotation typing.Self that has been added in Python 3.11. It is essentially just an alias for a TypeVar bound to the current class, as in the example immediately above, but it’s significantly simpler to use — you can see how in the updated example below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from datetime import date
from typing import NamedTuple, Self

class Person(NamedTuple):
    first_name: str
    last_name: str
    date_of_birth: date

    def replace_last_name(self, new_name: str) -> Self:
        return self.__class__(self.first_name, new_name, self.date_of_birth)

Class Methods

Self can be used in most places you’d expect, but on class methods will be another common example, so let’s see an example of that as well. In the code snippet below, the Person class has been updated to include a from_csv() method which is passed a string which contains a comma-separated row of values, and is expected to construct a Person instance from it and return it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from datetime import date
from typing import NamedTuple, Self

class Person(NamedTuple):
    first_name: str
    last_name: str
    date_of_birth: date

    @classmethod
    def from_csv(cls, row: str) -> Self:
        first, last, dob = (i.strip() for i in row.split(",", 2))
        return cls(first, last, date.fromisoformat(dob))

Importantly, the code here continues to work if included in a subclass, as nothing here specifically mentions the Person base class in the code or annotations.

String Literal Annotation

Injection attacks are one of the most common ways of subverting software. This is where carefully crafted user input is provided in such a way as to cause software to process it in a way which wasn’t anticipated by the author. One of the most common examples is SQL injection2, where user input is used unchecked in a query string.

To prevent this you need to sanitise your inputs somehow before using them. There are various domain-specific ways of doing this, such as parameterised queries in SQL. In addition, some languages also offer more general approaches to this problem, such as the taint checking which is offered by languages like Perl and Ruby.

Python doesn’t offer generalised taint checks, though there are some static analysis tools which claim to do this — the Pyre type checker also includes the Pysa static analyser which performs a taint analysis. Exploring Pyre in detail is still on my “to do” list, however, and may well form the topic of a future blog article.

That said, in Python 3.11 there’s a new type annotation which will be of some help in preventing injection attacks in cases like passing SQL queries. This is the typing.LiteralString annotation, which is described in PEP 675.

You may remember that a somewhat similar-sounding annotation, typing.Literal, was added back in Python 3.8 — I briefly described it in an earlier article. This allowed you to annotate that a particular parameter must have one of a pre-determined set of specific literal values.

LiteralString is a generalisation of this, which permits any string value but only if it has been constructed from literals which are sourced from within the code, and not any user input.

So, if you consider the execute() method of the sqlite3.Cursor object, you could annotate it as follows:

from collections.abc import Iterator
from typing import LiteralString

class Cursor:
    ...
    def execute(
        self,
        sql: LiteralString,
        parameters: Iterator[str]|dict[str, str] = (),
        /):
        ...

At runtime this will have no effect, as with other type annotations — indeed, at runtime a literal string is just a plain str like any other, so this is too late to apply any checks. Within the type checker, this information is already tracked in order to implement the checks for Literal, so LiteralString uses this same machinery — in essence it’s a superset of all possible Literal values.

For completeness, all of the following cases are compliant with LiteralString:

  • An actual literal (e.g. x = "hello").
  • The concatenation of two variables which are themselves LiteralString (e.g. y = x + " world").
  • sep.join(items) is compliant if sep is a LiteralString and items is an Iterable[LiteralString].
  • An f-string can be compliant but only if all of its constituent expressions are LiteralString, and similarly with str.format().
  • The PEP has an appendix which lists the str methods which preserve the LiteralString status of the string.

Whilst full-scale taint checking would be nice, this probably covers a pretty high proportion of the common cases all on its own, since it’s mostly string inputs from users which cause the problems. Hence, I’m glad to see pragmatic steps being taken to allow these security flaws to be detected earlier.

Data Class Decorators

Note: This is a bit of an obscure feature, and unlikely to be of interest unless you want to implement your own equivalent of dataclasses.dataclass in the Python library. I mention this now so you can skip to the next section if you’re not interested.

In Python 3.7 the dataclasses module was introduced, offering the @dataclass class decorator which made a class into a sort of a mutable namedtuple. I went through this in an earlier article in the series.

Type checkers generally have good support for this module and its decorator, since it’s part of the standard library. However, there are also populate third party libraries which offer similar facilities, and these are less well supported — one recently popular example is pydantic, which adds runtime checks on class attributes based on annotations, typically to implement services such as HTTP-based APIs.

The main change here is the addition of a typing.dataclass_transform decorator. This can be applied to a decorator function, class or metaclass, and hints to a type checker that this decorator endows classes it creates with additional dataclass-like behaviours.

Since this is potentially a little confusing, just to clarify — the @dataclass_transform decorator is applied to the decorator that you write which itself is used to decorate classes. Perhaps an example might make this clearer — I won’t give an implementation of the decorator function, since it would unnecessarily complicate this example with a lot of additional code.

from typing import dataclass_transform

# This is the equivalent of dataclasses.dataclass().
@dataclass_transform
def my_dataclass_decorator(cls):
    """Adds __init__() etc. and returns updated class."""
    ...

# And this is how you would use the decorator defined above.
@my_dataclass_decorator
class MyClass:
    one: int
    two: str

The specific changes that are assumed to be supported by such decorators are:

  • Adding an __init__() method based on the declared data fields.
  • Adding rich comparison methods (i.e. __eq__(), __gt__(), etc.)
  • Supporting “frozen” classes, which indicate the values are immutable.
  • Supporting “field specifiers”, which describe attributes of fields, such as a default value factory function.

However, not all of these are necessarily always implemented by such a decorator. As with @dataclass, it’s assumed that the decorator can take parameters to customise these changes. For example, @dataclass can accept init=False to disable the generation of an __init__() method, or order=True to add additional rich comparison operators like __gt__(), whereas by default only an equality comparison is added.

Type checkers are expected to honour the same parameters as accepted by @dataclass, and assume that they provide the same function. Also, since the default values of these may differ between @dataclass and the third party decorator, the @dataclass_transform decorator itself can take parameters to specify the default in use. For example, if called as @dataclass_transform(eq_default=False) then if the caller of the third party decorator doesn’t provide an eq argument then the value will be assumed to be False — in the standard library @dataclass, the default would be True.

The parameters that @dataclass_transform accepts are listed, along with the decorator parameter whose default they set. The meanings of the decorator params can be found in the standard library documentation for @dataclass.

@dataclass_transform param Decorator param
eq_default eq
order_default order
kw_only_default kw_only

As well as these, there’s also a field_specifiers parameter to @dataclass_transform, which specifies a list of supported classes to describe fields — i.e. classes which provide equivalent functionality to that of dataclasses.field in the standard library.

The only other aspect that I want to mention here is that there is a small runtime change as well as the annotation aspect — a new attribute __dataclass_transform__ is added to the decorator function or class for introspection purposes. This will be a dict containing the parameters which were passed to @dataclass_transform.

That’s about as much detail as I’d like to go into, but do have a read through PEP 681 if you want the full details. You might also like to peruse the original PEP 557 which described the original dataclasses module.

From Future Import No Annotations?

The final change I’d like to discuss here isn’t really a change but a lack of a change — or perhaps a change of plans. First some context, for those who don’t keep up with Python development closely.

Back in Python 3.7, PEP 563 was introduced which postponed evaluation of annotations — instead of being processed at parsing time, they’re preserved in __annotations__ in string form for later use. The main goals of this PEP were twofold:

  • Support forward references, such as when method of a class needs to return an instance of the class. Since the class definition isn’t finished yet, it’s not valid to refer to it at that point in the compilation process.
  • Avoid the overhead of executing annotations at module import time, as these won’t be used most of the time.

These changes were present in Python 3.7, but had to be activated with the use of from __future__ import annotations at the start of the source file. The original intention was to then make this behaviour the default in Python 3.10 — i.e. the __future__ import would no longer be required, and the change would impact everyone.

Around April 2021, however, this change was deferred and moved out of Python 3.10. This was done because it transpired that various people had started to use annotations for purposes other than type hints, and deferring their execution would break this code. The example which seemed to be creating the most noise was the pydantic project, and its use in the FastAPI framework — this is apparently a Python framework for building HTTP-based APIs.

Roll on Python 3.11, and the decision appears to have been deferred again. However, this time the announcement explicitly mentioned the possibility that PEP 563 may not be ever accepted as a default.

In the interest of full transparency, we want to let the Python community know that the Steering Council continues to discuss PEP 563 (Postponed Evaluation of Annotations) and PEP 649 (Deferred Evaluation Of Annotations Using Descriptors). We have concluded that we lack enough detailed information to make a decision in favor of either PEP. As we see it, adopting either PEP 563 or PEP 649 as the default would be insufficient. They will not fully resolve the existing problems these PEPs intend to fix, will break some existing code, and likely don’t address all the use cases and requirements across the static and dynamic typing constituents. We are also uncertain as to the best migration path from the current state of affairs.

The PEP 649 to which they refer is an alternative approach which involves deferring the construction of __annotations__ to the point where it’s first queried, after which point forward references will have most likely been resolved. It also means the overhead of constructing it is only incurred when it’s actually queried — the value is cached, so the overhead is still only incurred once.

You might also like to read this message from Łukasz Langa, the author of PEP 563, where he discusses his take on the situation (as of April 2021). I think it’s a really clear summary of the issues, and a great way to catch up.

So what are we to do with all this right now?

Well, it seems to me that it’s unlikely that PEP 563 will be accepted to become the default at this point — it breaks things for a sizeable community of users, and there doesn’t seem to be any way of preventing that without a major change on one side or the other. None of us have a crystal ball, mind you, that’s just my opinion.

That said, PEP 649 isn’t accepted at all as yet, so the from __future__ import co_annotations it uses can’t be used even in Python 3.11. As a result, as I see it there’s only two options for most developers who want to use type hints:

  • Continue to use PEP 563 until PEP 649 or some other option is accepted.
  • Don’t write code that uses forward references.

The second option did get significantly easier in a few common cases with the addition of typing.Self in Python 3.11, so maybe that’s the way I’ll be going — I can usually find ways to structure my code around any other use of forward references, since I’m rather used to doing the same sort of thing in C/C++ anyway.

Frankly, it’s not a great situation. If you can avoid use of PEP 563 features for now, I would suggest your life is potentially going to be easier in future if so — but I wouldn’t hold back from using type hints just for that reason, it’s not like it’s probably not a massively difficult change to update later, particularly if you only use from __future__ import annotations in the specific source files where it’s needed, because then you have an easier thing to search for to find which source files might need updates later as things change.

Overall, though, I definitely hope some conclusive decision is taken before Python 3.12 — I suspect in terms of total overall pain caused, the current indecision is probably hurting more than either of the choices would do.

I think that’s the major language changes in Python 3.11 covered now, so in the next article I’ll mop up any of the smaller changes I think are worth mentioning, plus make a start discussing the updates to the standard library modules.

Conclusion

That’s about it for this topic. The subject of type hints are starting to get into the long weeds, now, but that’s a promising sign — it probably means that the simple problems are all solved and there’s relatively little to hold people back from type hinting almost all of their codebase.

Of the items I’ve discussed above, I have to say that typing.Self, arguably the simplest feature, is the one I’m going to most appreciate. The rest of them are a little more niche, but mostly I’m glad that type hinting is still getting a good deal of attention as I feel it’s a really solid step any developer can take to not only catch bugs earlier, but make their code more comprehensible to newcomers.

I think that’s about it for the major new features in Python 3.11, so in the next article I’ll cover out any of the smaller changes I think are worth highlighting, and also take a look at the new modules added, plus make a start on looking at the updates to existing modules.


  1. More accurately, keyword parameters in the base class specification are passed to the __init_subclass__() class method of the base class, which was introduced in Python 3.6 and which I discussed in a previous article in this series. 

  2. If you’re writing code which uses SQL and you don’t know about SQL injection, go read up on it right now (or at least before you write any more code) before some Little Bobby Tables teaches you the lesson in a more painful way. 

Next article in the “Python 3 Releases” series: What’s New in Python 3.11 - New and Improved Modules
Tue 17 Jan, 2023
3 Jan 2023 at 4:20AM in Software
 |   |