☑ Python 2to3: What’s New in 3.5 - Part 2, Type Hinting

2 May 2021 at 1:06PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, this is the second looking at Python 3.5. In it we examine another of the significant new features in this release, type hinting.

green python two 35

This is the second article looking at features added in Python 3.5, since it was quite a milestone release in a number of ways so I’m trying to give the more major features proper coverage. Last time we looked at Coroutines, this time it’s the turn of type hinting.

The syntax for annotating function arguments was added way back in Python 3.0, but the syntax was all that was specified — the semantics of the annotations was left as an exercise to each programmer to define themselves. Probably unsurprisingly, the use to which most people put this syntax was type annotations. Since Python’s dynamic typing can sometimes make it a little tricky to work out what type is expected in a particular context, type annotations are a helpful form of documentation. Furthermore it enables static analysis tools to perform correctness checking, which wouldn’t be possible without annotations. For these reasons, type information is an obvious candidate for annotations.

In response to this, the Python maintainers decided that type annotations made sense as an addition to the standard library, so everyone could benefit from a standard method to apply these annotations. This allows tooling to develop around this standard, which is much less likely to happen if everyone uses subtley different ways to achieve the same end, and it’s also just less work for everyone.

There are two important points to stress here, however. The first is that although this release adds features to the standard library to add type annotations in a standardised way, it doesn’t actually add any type-checking features to the language or the library. Fortunately when 3.5 was released there was already the mypy utility to perform this static analysis, and it makes full use of the new syntax.

The second point to stress is that this is still, and probably always will be, an optional feature. Nobody is obliged to perform type annotation, either by language requirements or by convention — it’s simply a feature available for anyone who wants to use it. Things have been carefully set up so that code using type hints can be freely mixed with code that doesn’t and all of the runtime behaviour is identical.

Subtyping

Before we jump in and see some code, I wanted to say a few words about subtypes. This is because it underpins all of the rules implemented by the Python type-checkers, so it’s important to be familiar with it. Some of you may already be experts, so you might like to skim this section.

You’re probably quite used to hearing about subtyping as a synonym for subclassing in inheritance hierarchies. One key point to note, however, is that subtyping is a relationship between any types, however they’re declared.

Colloquially, declaring that TypeSub is a subtype of TypeSuper is essentially saying that any code which expects TypeSuper would also work successfully with TypeSub. But how do we define this more rigorously? It boils down to two requirements:

  • Every possible value for TypeSub must also be a possible value for TypeSuper.
  • Every method from TypeSuper must also be callable on TypeSub.

If both of these requirements hold true then TypeSub is indeed a subtype of TypeSuper. An example of this is that int is a subtype of float, since integers are a subset of the real numbers1. The canonical example is that a subclass is a subtype of all its parent classes.

Right, that’s enough theory, let’s look at some practice.

Simple Type Hints

Let’s kick off by considering an extremely simple example:

1
2
3
4
5
6
def my_function(arg1: int) -> float:
    return arg1 * 2.5

print(my_function(100))
print(my_function(5.5))
print(my_function(my_function(200)))

As is hopefully obvious, we’re declaring my_function() here to take a single int argument and return an float result. Then we call this function a few times, including a couple of times that break the type-checking rules. Python will execute this fine and no errors or warnings will be emitted. If we run it under mypy, however, we see the problems:

type-hints.py:5: error: Argument 1 to "my_function" has incompatible type "float"; expected "int"
type-hints.py:6: error: Argument 1 to "my_function" has incompatible type "float"; expected "int"
Found 2 errors in 1 file (checked 1 source file)

So far so simple, looks like this is going to be a very short article!

Container Types

Slightly less simple is the issue of container types such as list and dict. These are more complicated not just because they contain other types, but also beacuse they can contain heterogeneous types (i.e. values within them can have different types to each other). Let’s ignore the issue with heterogeneity for now, we’ll come back to that a little later, and just consider homongeneous cases (i.e. every contained item has the same type).

To represent these we have our first encounter with the new typing module. This provides a number of utility classes which are useful for declaring type hints. It’s important to note that these generic classes are not equivalents to the types themselves — you can’t construct an instance of them, you can only use them with type annotations.

Taking the case of list as an example, there is a class typing.List to represent this. On its own, that would indicate an unrestricted heterogeneous list, which can contain any types at all. This isn’t a particularly useful type hint, however, so you can use square brackets to indicate the type of the contained item, as in typing.List[int]. This indicates that you expect every item in that list to be an int.

The code snippet below illustrates various combinations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import typing

def one(arg: typing.List) -> None:
    pass

def two(arg: typing.List[int]) -> None:
    pass

def three(arg: typing.List[float]) -> None:
    pass

one([1,2,3])
two([1,2,3])
three([1,2,3])

one([1.1, 2.2, 3.3])
two([1.1, 2.2, 3.3])
three([1.1, 2.2, 3.3])

one([1, 2, 3.3])
two([1, 2, 3.3])
three([1, 2, 3.3])

The output of running mypy is as follows:

type-hints-containers.py:17: error: List item 0 has incompatible type "float"; expected "int"
type-hints-containers.py:17: error: List item 1 has incompatible type "float"; expected "int"
type-hints-containers.py:17: error: List item 2 has incompatible type "float"; expected "int"
type-hints-containers.py:21: error: List item 2 has incompatible type "float"; expected "int"
Found 4 errors in 1 file (checked 1 source file)

All the calls to one() are fine, because the type hint indicates the items in the list can be of any types. Similarly, all the calls to three() are fine because both int and float are subtypes of float. The problems are on line 17, where all three float values are a mismatch for the List[int] argument, and line 21 where the single float in the list is a mismatch.

Within the typing module are classes to represent the builtin container types, including those from modules such as collections. These include:

  • DefaultDict
  • Deque
  • Dict
  • FrozenSet
  • List
  • Set

You’ll notice Tuple is missing from the list. This is beacuse it’s covered below in the Typing Primitives section. The reason it’s different is because all of the above are typed homogeneously — every item in the container has the same type specifier. As you’ll see later, however, Tuple has a different specifier for each element2.

There’s also a typed version of collections.namedtuple called typing.NamedTuple. This is actually a concrete rather than generic type which is used to actually declare the class rather than just as a type hint. These two declarations are functionally equivalent, aside from the addition of the type hints:

Student = typing.NamedTuple(
    "Student",
    [("name", str), ("address", str), ("age", int)]
)

Student = collections.namedtuple(
    "Student",
    ["name", "address", "age"]
)

As an aside, some additional features in Python 3.6 make these rather easier to define, which I’ll cover in a future article.

Abstract Containers

That still leaves us with some questions, however — how would we indicate that we want our function to take an iterable, but we don’t really care about the specific type, whether it’s list, tuple, or anything else?

Thankfully typing still has us covered, and provides classes that correspond to the different abstract container types provided by collections.abc. For example, if we just want any object that’s a read-only Sequence (i.e. provides __len__() and __getitem__()) of int then you can declare the parameter as typing.Sequence[int].

Here’s a list of the classes based on those container abstract base classes — most of the names are the same, but a few of them differ so I’ve included the corresponding classes in collections.abc as well. If you want details of the supported operations, check out the collections.abc module documentgation.

typing collections.abc
AbstractSet Set
AsyncGenerator AsyncGenerator 3
AsyncIterable AsyncIterable
AsyncIterator AsyncIterator
Awaitable Awaitable
ByteString ByteString 4
Container Container
Coroutine Coroutine 5
Generator Generator 6
Hashable Hashable
ItemsView ItemsView
Iterable Iterable
Iterator Iterator
KeysView KeysView
Mapping Mapping
MappingView MappingView
MutableMapping MutableMapping
MutableSequence MutableSequence
MutableSet MutableSet
Reversible Reversible7
Sequence Sequence
Sized Sized
ValuesView ValuesView

There are also some additional classes that don’t correspond to equivalents in collections.abc, but still represent abstract types:

  • SupportsAbs for any type that provides __abs__().
  • SupportsFloat for any type that provides __float__().
  • SupportsInt for any type that provides __int__().
  • SupportsRound for any type that provides __round__().

Other Types

To finish off, a few oddments that didn’t fit into the earlier sections:

  • IO for IO stream types, although most of the time you probably want to use one of the aliases:
    • TextIO is an alias for IO[str].
    • BinaryIO is an alias for IO[bytes].
  • Pattern and Match for the objects used by the re module.

Mixing Types

For more complicated situations, accepting a single type, or a homgeneous container containing a single type, is not sufficient. For these cases, the typing module offers some more facilities.

Let’s look at something more complicated. The code below introduces using our own classes with type hints, and also the use of typing.Union:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import typing

class Named:
    def __init__(self, name: str = "andy") -> None:
        self._name = name

    def get_name(self) -> str:
        return self._name.title()

class EnglishGreeter(Named):
    def greet(self, salutation: str) -> str:
        salutation = salutation[0].upper() + salutation[1:].lower()
        return salutation + ", my name is " + self.get_name()

class FrenchGreeter(Named):
    def greet(self, salutation: str) -> str:
        salutation = salutation[0].upper() + salutation[1:].lower()
        return salutation + ", je m'appelle " + self.get_name()

class GermanGreeter(Named):
    def greet(self, salutation: str) -> str:
        salutation = salutation.title()
        return salutation + ", ich heiße " + self.get_name()

def show_name(arg: Named) -> None:
    print("The name is " + arg.get_name())

def show_greeting(
    greeter: typing.Union[FrenchGreeter, GermanGreeter],
    salutation: str,
) -> None:
    print("Here is your greeting: " + greeter.greet(salutation))

english = EnglishGreeter("sophia")
french = FrenchGreeter("chloë")
german = GermanGreeter("hannah")
name_only = Named()

show_name(english)
show_greeting(english, "hello")
show_name(french)
show_greeting(french, "bonjour")
show_name(german)
show_greeting(german, "guten tag")
show_name(name_only)
show_greeting(name_only)

There are a few minor points to note first. First, note that self parameters don’t bother getting hinted. Second, note that __init__() is annotated as returning nothing (as it does return nothing). Third, note that None can be used directly even though really this is a value — this is a special case and can be taken to mean type(None) in type hints. Finally, this snippet illustates subclasses being subtypes with the subclasses of Named being passed to show_name() which expects a Named instance. This is exactly as we’d expect, of course, but it’s nice to see it demonstrated.

The significant new feature here is on line 29, where we use typing.Union to represent a group of types. A type is a subtype of this union if it’s a subtype of any of the types listed. This means that show_greeting() will accept only a FrenchGreeter or GermanGreeter, or a subclass of either of them, but no other types.

If you run this through mypy, it confirms this:

type-hints-unions.py:40: error: Argument 1 to "show_greeting" has incompatible type "EnglishGreeter"; expected "Union[FrenchGreeter, GermanGreeter]"
type-hints-unions.py:46: error: Missing positional argument "salutation" in call to "show_greeting"
type-hints-unions.py:46: error: Argument 1 to "show_greeting" has incompatible type "Named"; expected "Union[FrenchGreeter, GermanGreeter]"
Found 3 errors in 1 file (checked 1 source file)

The issue on line 40 is that we’re passing an EnglishGreeter which doesn’t match anything within the union. It runs fine when executed, mind you, it’s only because of our type hinting that mypy raises that error.

There are two issues on line 46, the first of which is simply that we’re missing the second required argument to show_greeting(). The second issue is once again becaused Named is not a subtype of any of the classes mentioned in the Union.

Typing Primitives

Now we’ve seen Union in action, let’s briefly look at the semantics of this and the other primitives that typing offers for defining types.

Any

First up is Any which matches any type whether assigned or being assigned to. This is subtly different to using object which is the supertype for all other types. Let’s say you define two functions accept_any(arg: typing.Any) and accept_obj(arg: object). They will both accept a parameter of any type, since anything is a subtype of either of them. However, if you then attempt to pass that value into another function you’ll find a difference — the value passed into accept_obj() can only be used with another function taking object or Any. However, the parameter to accept_any() can be passed into any other function regardless of the type.

The main function of the Any type is to act as the default of every parameter or return type which isn’t otherwise annotated. This is the mechanism which allows source code to be incrementally annotated and still benefit from type-checking, instead of getting no benefit until every piece of code is annotated.

Union[t1, t2,]
As we’ve already seen, this declares a list of types, a subtype of any of which will be acceptable. The order of the arguments is irrelevant and the type is automatically simplified in the obvious ways — for example, nested unions are flattened, and if the same union includes both a type and subtype then the subtype is removed as it’s irrelevant. The ultimate example of this is that if you include object then you’ll find the whole thing just evaluates to object, since any other types are by definition subtypes of it.
Optional[t]
This is a simple alias for Union[t, None].
Tuple[t1, t2,]
This matches a tuple whose values correspond to the types in the order specified. For example, Tuple[int, str] matches (123, "hello"). The number of arguments is always fixed, although there is a special case for Tuple[t, ...] using the ellipsis token (i.e. three dots). This specifies a variadic homogeneous tuple — i.e. any number of items, but they’re all of the same specified type.
Callable[[t1, t2,], tr]
Matches a callable object (e.g. function) with the specified signature. The initial list of types specify the type of the parameters, and the final tr specifies the return type. There’s no way to specify optional or keyword arguments, and no support for specifying variadic functions, but you can skip checking the parameter list by using the ellipsis token: Callable[..., tr].

Type Aliases

Because the objects in typing are just general classes (albeit with some restrictions) then they can be assigned to variables. This allows you to create aliases for specific types, rather like typedef in C/C++.

Here’s some code to calculate the total length of a path of line segments in 3D space. Consider how verbose the signatures would be without being able to declare Point and Path even in this extremely simple example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import math
import typing

Point = typing.Tuple[float, float, float]
Path = typing.Sequence[Point]

def line_length(x: Point, y: Point) -> float:
    sum_squares = sum((x[i] - y[i]) ** 2 for i in range(3))
    return math.sqrt(sum_squares)

def path_segments(path: Path):
    x, y = iter(path), iter(path)
    next(y, None)
    return zip(x, y)

def path_length(path: Path) -> float:
    return sum(line_length(x, y) for x, y in path_segments(path))

Generics and Type Variables

The final feature of type hints that I’m going to discuss is generic functions and type variables. To illustrate these, let’s consider a generic function for concatenating strings. Let’s say we want to make it work for both str and bytes, we could do something like this using the machinery we’ve learned so far:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from typing import Union

U = Union[str, bytes]

def concatenate(first: U, second: U) -> U:
    return first + second

concatenate("hello", " world")
concatenate(b"hello", b" world")
concatenate("hello", b" world")

This isn’t too bad except that the third call to concatenate() demonstrates a problem — based on that specification, there’s nothing to constrain first and second to be the same type as each other, since they’re all independent unions.

Just for giggles, let’s see what mypy tells us for this code:

type-hints-type-vars.py:6: error: Unsupported operand types for + ("str" and "bytes")
type-hints-type-vars.py:6: error: Unsupported operand types for + ("bytes" and "str")
type-hints-type-vars.py:6: note: Both left and right operands are unions
Found 2 errors in 1 file (checked 1 source file)

So even though our typing hinting wasn’t up to scratch, mypy has our back and let’s us know we run the risk of mixing str and bytes. However, it would be better to clarify that both parameteters must be the same type using hinting — we can do so using a type variable. This is illustrated below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from typing import TypeVar

T = TypeVar("T")

def concatenate(first: T, second: T) -> T:
    return first + second

concatenate("hello", " world")
concatenate(b"hello", b" world")
concatenate("hello", b" world")

At this point all we’ve said is that concatenate() takes two parameters of the same type, and returns that type also. Any C++ programmers among you might be finding this somewhat similar to templating and it’s definitely got similarities, but of course in Python it’s just for type checking purposes. This function is known as a generic function.

So what does mypy make of this code now?

type-hints-type-vars.py:6: error: Unsupported left operand type for + ("T")
Found 1 error in 1 file (checked 1 source file)

It still has an issue with line 6 which is that since the type of T is unconstrained there’s nothing to stop us passing variables of a type that doesn’t support the + operator such as set. OK, so let’s update line 3 to constrain this variable to str or bytes:

3
T = TypeVar("T", str, bytes)

Now re-running mypy yields a slightly cryptic error message, but it’s now latched on the error on the correct line:

type-hints-type-vars.py:10: error: Value of type variable "T" of "concatenate" cannot be "object"
Found 1 error in 1 file (checked 1 source file)

Conclusions

I think that more or less covers the basics of the type hints added in Python 3.5. There are some additional details I’ve glossed over, such as covariance and contravariance of types. If you want to know more of the gory details then I suggest PEP 483 as a starting point for some discussion of the theory behind type checking, and then PEP 484 for more specifics on the implementation.

Type hinting is something I’ve always danced around the edges of with Python since I never took the time to get a proper grounding in it, but now I’ve gone through in more detail it’s definitely something I’ll be looking to make more use of. I must admit I don’t often run into type mismatch bugs in my own code, since I generally find if there’s type ambiguity then it’s often a sign of sloppy code structure that should be tidied up. That said, of course it’s happened that I’ve used, say, a date here and a datetime there and it’s lead to some annoying issues that don’t always show up in simple unit test cases.

Even ignoring the value of type hints to find bugs, however, there’s also a huge value in expressing the programmer’s expectations to anyone reading the code. This helps with understanding new code, as well as identifying bugs at code review time. In my opinion that’s a bigger benefit than enabling the static type checks, although I don’t want to clearly it’s best to have both.

That’s it from me. Next time I’ll be looking at the remaining smaller syntax enhancements for matrix multiplication and iterable unpacking, and some other additions to the standard library as well.


  1. OK, so before I get lots of angry comments I know this isn’t strictly true. The mathematical statement that integers are a subtype of real numbers is true, but in Python you can represent numbers with int that you can’t with float — for example, try doing float(int(sys.float_info.max) * 10). However, in my defense, mypy does allow int to be used anywhere where float or complex is expected, which is essentially treating int as a subtype of these other types. 

  2. Randomly shuffling topics around is my cunning plan to present the appearance that I think about the structure of my articles in advance. Clever, eh? As long as I’m not daft enough to tell you that’s what I’m doing, it’s pretty convincing. 

  3. Strictly speaking this wasn’t added to collections.abc until Python 3.6, but it’s in typing in 3.5 so I’m still covering it here. It uses the same type specification as Generator except that async generators cannot return a value so there’s only two types to specify, the type to yield and the type to send. 

  4. This represents bytes, bytearray and memoryview, and as a shorthand bytes can be used for any argument of those types. 

  5. Uses the same type specification as Generator, see the footnote for that for details. 

  6. A generator needs up to three types specified: the type that’s yielded from the generator, the type that’s expected to be sent to the generator, and the type that can be returned from the generator. The syntax for specifying the generator type is Generator[type_yield, type_send, type_return]. A generator which is expected to yield integers and not expected to recieve any sent values or to return a value would be specified with Generator[int, None, None]

  7. If you want to get techincal, Reversible wasn’t added to collections.abc until Python 3.6, but it was added to typing in 3.5 so I’m including it here. 

2 May 2021 at 1:06PM in Software
 |   | 
Photo by David Clode on Unsplash