☑ What’s New in Python 3.10 - Pattern Matching

6 Nov 2022 at 10:45PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, we take a first look at Python 3.10, focusing on the new features in the language and library. In this post we’ll cover the new structural pattern matching feature.

This is the 21st of the 22 articles that currently make up the “Python 3 Releases” series.

python 310

Python 3.10 was released in October 4th 2021 and having looked through the release notes, it seems like it includes quite a few significant features. Not least of which is a major new language feature called structural pattern matching, but the release also includes some significant improvements to error reporting, especially for syntax errors, which is very helpful during the development cycle. There are also some useful extensions to type hints, such as a more convenient syntax for unions, some more powerful ways to type hint function parameters, and an explicit way to declare type aliases to help type checkers make the right deductions.

This release is one that I’ve been using for at least a few months, but haven’t had chance to explore any of these new features in detail yet, so I’m excited to learn more. In this article I’m going to focus on the most substantial change in this release, the new pattern matching feature.

Introduction

Pattern matching is a big new feature in Python, sufficiently fundamental that it’s introduced by not one but three PEPs.

  • PEP 634 contains the technical specification of the feature, without any discussion of motivation or rationale for it.
  • PEP 635 discusses the justification for adding the feature to the language.
  • PEP 636 is a gentle introduction and tutorial for those not familiar with pattern matching.

The feature is similar to those with which you may be familiar in other languages such as Rust, Scala, Ruby or Haskell. For the avoidance of doubt, the use of the term “pattern” here is nothing to do with regular expressions or the re module — this is a language syntax feature.

It’s a flexible feature whose use-cases stretch from a more elegant replacement of a lengthy if/else block to complex capture and extraction of values from inside other data structures. We’ll start off with some some cases and build up from there.

Matching Literals

In it’s simplest form, it’s similar to the switch statement in C and related languages, so let’s look at a simple example of that first.

>>> def ordinal_digit(n):
...     match n:
...         case 1:
...             return f"{n}st"
...         case 2:
...             return f"{n}nd"
...         case 3:
...             return f"{n}rd"
...         case _:
...             return f"{n}th"
...
>>> print(" ".join(ordinal_digit(i) for i in range(1, 10)))
1st 2nd 3rd 4th 5th 6th 7th 8th 9th

For anyone familiar with switch this should be quite similar, though note that unlike in C and friends you do not need to terminate a case statement with break — there is no implicit “fall-through” to the next block. Even for those not so acquainted it should be fairly obvious how this equates to a simple if/else structure.

One slightly novel detail here is that the _ pattern is a wildcard that always matches, regardless of the value — it even matches None. This is equivalent to the default keyword in some other languages. It’s also important to note that case statements are processed in order, so case _: will always occur at the end — indeed, if it doesn’t then you’ll get a SyntaxError.

>>> match 100:
...     case _:
...         print("Default")
...     case 100:
...         print("100")
...
  File "<stdin>", line 2
SyntaxError: wildcard makes remaining patterns unreachable

One other observation is that if there is no case _: and none of the other patterns match, the statement does nothing and execution proceeds beyond it as normal, just like a false if statement without an else.

Guards

So far so simple, this is really a very thin dusting of syntactic sugar on a conventional if/else block. The match keyword is considerably more flexible than this as we’ll see below, but for right now we’ll stick in the realms of if statements and look at guards.

This is where a case statement can have an additional conditional added to it, and the block is only chosen if the value matches the pattern and the guard evaluates to True. We can use this to make our ordinal() function above more flexible.

>>> def ordinal(n):
...     match n % 10:
...         case 1 if not 3 < n % 100 < 21:
...             return f"{n}st"
...         case 2 if not 3 < n % 100 < 21:
...             return f"{n}nd"
...         case 3 if not 3 < n % 100 < 21:
...             return f"{n}rd"
...         case _:
...             return f"{n}th"
...
>>> print(" ".join(ordinal(i) for i in (1, 2, 3, 4, 5, 11, 12, 20, 21, 22, 101, 111, 132)))
1st 2nd 3rd 4th 5th 11th 12th 20th 21st 22nd 101st 111th 132nd

These guards are always executed if the case pattern matches, and are permitted to have side-effects, which may be useful. They are particularly useful when combined with the more flexible matching of patterns that we’ll in later sections.

Alternation

A pattern may match against multiple values using | for alternations. See for example below a clumsy attempt at speeding up a primality test by checking for common cases.

>>> def check_prime_slow(number):
...     '''Return True iff number is prime.'''
...     raise NotImplementedError()
...
>>> def check_prime(number):
...     match number:
...         case 2 | 3 | 5 | 7 | 11:
...             return True
...         case _ if number % 2 == 0:
...             return False
...         case _:
...             return check_prime_slow(number)
...
>>> check_prime(2)
True
>>> check_prime(3)
True
>>> check_prime(4)
False
>>> check_prime(5)
True
>>> check_prime(6)
False
>>> check_prime(9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in check_prime
  File "<stdin>", line 3, in check_prime_slow
NotImplementedError

Sequence Patterns

Patterns can also capture values from sequences, rather like the unpacking rules for assigning a list of variables to an expression. This is illustrated in the example below.

>>> def summarise_list(items):
...     match items:
...         case [item]:
...             print(f"Your one item is {item}")
...         case [first, second]:
...             print(f"Your two items are {first} and {second}")
...         case [first, *rest, last]:
...             print(f"Your first and last items are {first} and {last},"
...                   f" and you have {len(rest)} more")
...
>>> summarise_list(["apple"])
Your one item is apple
>>> summarise_list(["apple", "orange"])
Your two items are apple and orange
>>> summarise_list(["apple", "orange", "pear", "pineapple", "lemon"])
Your first and last items are apple and lemon, and you have 3 more

You can match a fixed length list, or you can match a sequence of items not otherwise matched using the * prefix. As you can see in the example above, this can occur anywhere in the matched pattern, but for hopefully obvious reasons you can’t have more than one in a pattern.

>>> match [1,2,3,4,5]:
...     case [a, *b, *c, d]:
...         print(a, b, c, d)
...
  File "<stdin>", line 2
SyntaxError: multiple starred names in sequence pattern

It’s also important to note that sequences can have constants in as well as capture variables, and these are matched and captured as you’d expect.

>>> match ["delete", "entry.txt"]:
...     case ["add", filename]:
...         print(f"Going to create {filename}")
...     case ["delete", filename]:
...         print(f"Going to delete {filename}")
...
Going to delete entry.txt

One final subtlety to mention here is that if you’re using the alternation syntax described earlier, every alternate is required to capture the same set of variables. This avoids the situation where there is ambiguity over which captured variables will be defined in the body of the case block.

>>> match [1, 2]:
...     case [a] | [a, b]:
...         print(a)
...
  File "<stdin>", line 2
SyntaxError: alternative patterns bind different names

Mapping Patterns

In a similar way to sequences, mappings can also be matched and extract values.

>>> import datetime
>>> def birthday(person):
...     today = datetime.date.today()
...     match person:
...         case {"birth_month": today.month, "birth_date": today.day}:
...             print(f"Happy Birthday, {person['name']}")
...         case {"birth_month": today.month}:
...             print(f"{person['name']}, it's your birthday this month")
...
>>> person_1 = {"name": "Guido", "birth_date": 31, "birth_month": 1}
>>> person_2 = {"name": "Chonchita", "birth_date": 6, "birth_month": 11}
>>> person_3 = {"name": "Andy", "birth_date": 4, "birth_month": 11}
>>> birthday(person_1)
>>> birthday(person_2)
Happy Birthday, Chonchita
>>> birthday(person_3)
Andy, it's your birthday this month

As you can see, this works in a very similar way to sequences. The main difference is that matching against mapping implicitly only looks for the subset of fields mentioned in the case pattern.

Class Patterns

Finally, patterns can also be matched based on the type of the expression provided. In its simplest form, this can be done by specifying the name of a class with a pair of empty brackets after it.

>>> def my_type_of(arg):
...     match arg:
...         case str():
...             return "String"
...         case int():
...             return "Integer"
...         case float():
...             return "Float"
...         case _:
...             raise Exception("Not supported")
...
>>> my_type_of(123)
'Integer'
>>> my_type_of("123")
'String'
>>> my_type_of(123.0)
'Float'
>>> my_type_of(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in my_type_of
Exception: Not supported

In addition to matching the type, the attributes of objects can be matched and/or captured using a keyword parameter-like syntax.

>>> from typing import NamedTuple
>>> class Point(NamedTuple):
...     x: int
...     y: int
...
>>> def get_axis(point: Point) -> str:
...     match point:
...         case Point(x=0, y=0):
...             return "Origin"
...         case Point(x=0, y=offset):
...             return f"X-axis {offset} from origin"
...         case Point(y=0):
...             return "Y-axis"
...         case _:
...             return "Not on any axis"
...
>>> get_axis(Point(17, 13))
'Not on any axis'
>>> get_axis(Point(17, 0))
'Y-axis'
>>> get_axis(Point(0, 13))
'X-axis 13 from origin'
>>> get_axis(Point(0, 0))
'Origin'

It’s also possible to do this matching positionally as well as by keyword. This comes for free with some standard library classes such as namedtuple and those declared with the @dataclasses.dataclass decorator. However, for other classes the ordering of attributes isn’t well defined, so to support positional matching you’ll have to define a dunder class member called __match_args__ which is a tuple defining the order in which attributes should be matched.

>>> class City:
...     __match_args__ = ("continent", "name")
...     def __init__(self, name, continent="Europe", population=0):
...         self.name = name
...         self.continent = continent
...         self.population = population
...
>>> city1 = City("London", "Europe", 9500000)
>>> city2 = City("London", "North America", 10200)
>>> city3 = City("Perth", "Australia", 2200000)
>>> city4 = City("Perth", "Europe", 47400)
>>>
>>> def welcome(city):
...     match city:
...         case City("North America"):
...             return "Welcome to America"
...         case City("Australia"):
...             return "Welcome down under"
...         case City("Europe", "London"):
...             return "Welcome to London"
...         case City():
...             return "Welcome to somewhere"
...
>>> welcome(city1)
'Welcome to London'
>>> welcome(city2)
'Welcome to America'
>>> welcome(city3)
'Welcome down under'
>>> welcome(city4)
'Welcome to somewhere'

As you can see, the attribute order doesn’t need to match that in __init__() or indeed anything else, and you can match against as many or as few of the attributes specified as you wish in any given case pattern.

Capture vs. Constant

Since captures are effectively declaring new variables, it’s important to note that an unqualified name (i.e. with no dots) will always be taken to be a capture parameter. If you try to refer to a name in the local scope as a constant against which to match, you’ll probably regret it.

>>> SOME_CONSTANT = 123
>>> match 200:
...     case SOME_CONSTANT:
...         print(f"Matched against {SOME_CONSTANT}")
...
Matched against 200
>>> SOME_CONSTANT
200

Note that this is not an issue with scope, it’s an issue with syntax — even if a variable is accessed from another scope, it’ll still be assigned.

>>> SOME_CONSTANT = 123
>>> def func(value):
...     global SOME_CONSTANT
...     match value:
...         case SOME_CONSTANT:
...             print(f"Matched {SOME_CONSTANT}")
...
>>> func(300)
Matched 300
>>> SOME_CONSTANT
300

However, if your value is qualified using at least one . then this works as you’d expect — it’s only unqualified names you need to worry about.

>>> import signal
>>> signal.SIGKILL
<Signals.SIGKILL: 9>
>>> match 9:
...     case signal.SIGINT:
...         print("Interrupt")
...     case signal.SIGTERM:
...         print("Terminate")
...     case signal.SIGKILL:
...         print("Kill")
...
Kill

Nested Matching

The final aspect that I’d like to discuss is the fact that these patterns can be nested within each other for quite powerful results. The sample code below combines some of the techniques described so far to implement a (grossly insufficient) YAML encoder.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def simple_yaml_encode(values, indent=0):
    """Simple encoder for a small subset of YAML."""

    prefix = indent * " "
    for item in values.items():
        match item:
            case key, str(value) | int(value):
                print(f"{prefix}{key!r}: {value!r}")
            case key, list(items) | tuple(items):
                print(f"{prefix}{key!r}:")
                for item in items:
                    print(f"{prefix}    - {item!r}")
            case key, value:
                print(f"{prefix}{key!r}:")
                simple_yaml_encode(value, indent + 4)


simple_yaml_encode(
    {
        "Languages": ("English", "French", "German"),
        "Numbers": {
            "English": {1: "one", 2: "two", 3: "three"},
            "French": {1: "un", 2: "deux", 3: "trois"},
            "German": {1: "ein", 2: "zwei", 3: "drei"},
        },
    }
)

As you can see above, constructs such as alternations and captures are applied recursively to the patterns, and they can combine to quite concisely express some unpacking options.

Conclusion

That’s it for this article. I was impressed by the flexibility of this feature — I kept trying to push it in directions I thought wouldn’t work, but more or less everything I could think of to try could be achieved. I’m sure there are some constructs which are too complex to implement with a single pattern, but as often seems the case I suspect things will get fairly unreadable if you push against its limitations too hard anyway. Frankly that YAML encoding above is already pushing what I consider to be unreadably dense code.

Overall, this is a great feature that I’m very happy to see finally join the langauge, and it’s a bonus that it’s in such a flexible form. I think what remains to be seen is the extent to which this feature will be “enough rope” for developers to construct impenetrable or buggy code — but I’m encouraged by the extent to which issues such as unreachable patterns trigger syntax errors instead of runtime bugs.

Next time I’ll be looking at some of the smaller language enhancements in 3.10, such as better error reporting and type hinting.

This is the 21st of the 22 articles that currently make up the “Python 3 Releases” series.

6 Nov 2022 at 10:45PM in Software
 |   |