In this series looking at features introduced by every version of Python 3, we continue our look at the new Python 3.11 release, looking at some smaller new features, two new modules and some of the library changes.
This is the 27th of the 32 articles that currently make up the “Python 3 Releases” series.
So far in this series we’ve looked at the major new features in Python 3.11, specifically the performance improvements, exception enhancements and new type hint features. In this article we’ll take a brief look at a few other smaller features, a couple of new modules that have been added, and a few of the standard library module changes.
We’ll kick off with a selection of smaller changes which stood out as being of interest to me.
for
loop¶Unparenthesized unpacking expressions now work in for
loops, as a consequence of the new PEG parser, which could lead to some more concise code.
>>> one = (11, 22, 33)
>>> two = (400, 500)
>>> for i in *one, *two:
... print(i)
...
11
22
33
400
500
It was discovered that subclasses of some builtins don’t pickle and copy properly if additional added attributes use __slots__
. This snippet is from Python 3.10, showing how an attribute isn’t copied on a subclass of bytearray
in Python 3.10.
Python 3.10.0 (default, Dec 3 2021, 01:57:53) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pickle
>>> class MyArray(bytearray):
... __slots__ = ("some_attr",)
...
>>> orig = MyArray(b'123')
>>> orig.some_attr = "hello"
>>> pickled = pickle.loads(pickle.dumps(orig))
>>> pickled.some_attr
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyArray' object has no attribute 'some_attr'
The same example in Python 3.11 shows that the attribute is correctly pickled and unpickled.
Python 3.11.1 (main, Dec 12 2022, 08:56:30) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pickle
>>> class MyArray(bytearray):
... __slots__ = ("some_attr",)
...
>>> orig = MyArray(b'123')
>>> orig.some_attr = "hello"
>>> pickled = pickle.loads(pickle.dumps(orig))
>>> pickled.some_attr
'hello'
For some time the pickling of objects could be customised by adding a __getstate__()
method to invoke as a serialisation function, and then implementing __setstate__()
to re-initialise the object instance from the serialised state. The serialisation of __slots__
has been implemented by adding a default __getstate__()
to object
, which you can see by calling yourself.
>>> orig.__getstate__()
(None, {'some_attr': 'hello'})
Normally Python will append the current directory to sys.path
so modules can always be imported from it. However, this means that standard library or other module names can be shadowed, unintentionally of maliciously, by a conflicting module in the current directory. To prevent this there is a new option in Python 3.11 which disables this implicit inclusion.
You can trigger this mode by passing -P
on the Python interpreter command-line, or by setting the PYTHONSAFEPATH
environment variable. This just removes the current directory from the initial sys.path
value, as you can see below (output reformatted slightly for readability).
$ python --version
Python 3.11.1
$ python -c 'import sys; print(sys.path)'
['', '/Users/andy/.pyenv/versions/3.11.1/lib/python311.zip',
'/Users/andy/.pyenv/versions/3.11.1/lib/python3.11',
'/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/lib-dynload',
'/Users/andy/.pyenv/versions/python-3.11/lib/python3.11/site-packages']
$ python -P -c 'import sys; print(sys.path)'
['/Users/andy/.pyenv/versions/3.11.1/lib/python311.zip',
'/Users/andy/.pyenv/versions/3.11.1/lib/python3.11',
'/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/lib-dynload',
'/Users/andy/.pyenv/versions/python-3.11/lib/python3.11/site-packages']
As I mentioned briefly in an earlier article in this series, Python moved to SipHash in Python 3.4 from the previous Fowler-Noll-Vo hash function — this was to address concerns which I discussed in an old article.
SipHash family functions are described as “SipHash-x-y“. where x and y are two parameters which determine the level of security:
As first implemented in Python 3.4, SipHash-2-4 was chosen as the variant to use. In Python 3.11, however, the SipHash-1-3 variant has been added, which trades some of the security for better performance on larger inputs. This new variant has also been set as the algorithm used in the standard build, but the original SipHash-2-4 is still available and can be chosen at compile-time, so some vendors may choose to do that in their builds.
$ python3.10 -c "import sys; print(sys.hash_info.algorithm)"
siphash24
$ python3.11 -c "import sys; print(sys.hash_info.algorithm)"
siphash13
The consensus in the Python development team seems to follow similar discussions among Rust and Ruby developers that the new variant is secure enough for conceivable current use-cases, and results in a nice performance boost in at least some cases.
I very much agree with this assessment. It’s debatable this may leave things more open to as-yet unknown DoS attacks in the future, but these hash values tend to be used only transiently — to persist them for long periods would lead to terrible headaches in the future if the algorithm is changed again, anyway, so that would be a terrible idea for application and library authors regardless of this issue. As a result, the long-term security of the values isn’t really a significant concern, so if it’s good enough for known attacks now then it’s probably not worth burning CPU cycles to add additional security of highly questionable value.
str
to int
¶You may well be aware that the int()
constructor, Python can convert a string of digits in any base from 2 to 36 (inclusive) into an int
value.
>>> int("1111011", 2)
123
>>> int("173", 8)
123
>>> int("123", 10)
123
>>> int("7b", 16)
123
>>> int("3f", 36)
123
However, what you might not be aware is that to do this in any base which isn’t a power of 2 is a comparatively painful operation. In the example below, I’ve constructed some very long numbers and you can see that converting bases 8 or 16 are extremely quick, but base 10 is significantly slower.
>>> timeit.timeit("int(x, 8)", setup="x='67' * 1000", number=100000)
0.6754992799833417
>>> timeit.timeit("int(x, 10)", setup="x='67' * 1000", number=100000)
3.333605546038598
>>> timeit.timeit("int(x, 16)", setup="x='67' * 1000", number=100000)
0.6724714860320091
Given that these are unusually long strings, none of them are what you could call slow, but the difference is very noticeable. The powers of 2 bases are both taking around 6.7 µs per iteration, whereas base 10 conversions take 33.3 µs, an order of magnitude slower. If an attacker can find a way to feed extremely large integers into your code, therefore, they can perform a DoS attack on the host it’s on. This was logged as CVE-2020-10735, and it was addressed by limiting the number of digits that you an convert from or to a base which isn’t a power of 2.
The limit is applied on both input and output, so printing is also covered. It defaults to 4300 digits, so the limit isn’t exactly restrictive for the vast majority of use-cases.
But is this a fuss about nothing? Well, you can alter the limit using sys.set_int_max_str_digits()
, so let’s see how bad things get with 100,000 digits.
>>> import sys
>>> import timeit
>>> sys.set_int_max_str_digits(100000)
>>> timeit.timeit("int(x, 16)", setup="x='1' * 100000", number=1000)
0.3209301750175655
>>> timeit.timeit("int(x, 10)", setup="x='1' * 100000", number=1000)
60.41181180800777
Here we can see that the base 16 case takes 321 µs per iteration, whereas the base 10 one takes 60 ms, which is two orders of magnitude longer. Considering that attacks could potentially feed code orders of magnitude more digits than this, the concern seems justified. The behaviour is also rather non-linear — the same thing for a million digits takes a massive 6 seconds for a single iteration.
>>> sys.set_int_max_str_digits(1000000)
>>> timeit.timeit("int(x, 10)", setup="x='1' * 1000000", number=1)
6.158579874027055
The potential for abuse is fairly obvious here, so adding the limitation certainly seems like a very sensible step.
Next up we’ll take a quick look at the two new modules added to the standard library in Python 3.11, namely tomllib
and wsgiref.types
.
tomllib
¶If you’re looking for a markup language for configuration or other data, there are quite a few options — it’s not particularly difficult to invent such a format, and there’s always going to be someone who thinks they can roll their own to improve on flaws, perceived or genuine, within the existing options. Formats which have a enjoyed a decent level of popularity include INI, XML, JSON, YAML and TOML.
The last of these is the baby of the bunch, just coming up for its 10th birthday. Whilst it’s not without its critics, it received something of a boost in the Python community when it was chosen by PEP 518 for storing build dependencies and other metadata. The only significant argument against it at that time was the lack of inclusion in the standard library of a module for parsing it. Now, nearly seven years after that PEP was published, that has finally been addressed.
The new tomllib
library that’s been added to Python 3.11 is a simple affair, and provides the facility only to parse TOML data, but not serialise it back in the direction.
It provides two functions:
load()
takes a binary file object and parses from it.loads()
takes a string and parses from that.There’s not a great deal to configure here. The TOML syntax encodes the type as well as the value, so you get structured information back in the native data types you’d expect — for example, a table becomes a dict
and an array becomes a list
. The one type worth calling out is float
as it’s possible to provide a different function to parse these. The default is float()
, as you’d expect, but you could, for example, pass parse_float=decimal.Decimal
to produce Python Decimal
for each TOML float
instead.
>>> from pprint import pprint
>>> import tomllib
>>>
>>> pprint(tomllib.loads("""
... some_attr = "some string"
... another_attr = 123.456
...
... # Now a table
... [mytable]
... one = ["un", "ein"]
... two = ["deux", "zwei"]
... three = ["trois", "drei"]
... """))
{'another_attr': 123.456,
'mytable': {'one': ['un', 'ein'],
'three': ['trois', 'drei'],
'two': ['deux', 'zwei']},
'some_attr': 'some string'}
So that’s about it for tomllib
, short and sweet. It’s definitely nice that you can parse pyproject.toml
without requiring third party libraries, but you’ll still need one to edit or create TOML files in code — perhaps a future PEP might also add this functionality to tomllib
.
wsgiref.types
¶The other new module in Python 3.11 is a simple extension to wsgiref
, the reference implementation of WSGI in the standard library, to add types that can be used for type hints. The types referred to all correspond to the usage defined in PEP 3333, and are listed below.
StartResponse
typing.Protocol
describing the type of the start_response()
callable, which is invoked to start the HTTP response. It’s responsible for returning a callable to send data to the client.WSGIEnvironment
dict[str, Any]
).WSGIApplication
InputStream
and ErrorStream
typing.Protocol
describing the types of the input and error streams, as described by PEP 3333.FileWrapper
typing.protocol
describing the interface provided by a file wrapper, an abstraction which iterates blocks of data from a file.To finish off this article, I’m going to make a start looking at some of the updates to a few of the standard library modules, with the rest covered in the final article in this series. I’m going to kick off with some changes to text processing modules, comprising:
re
regular expression matching library, adding support for atomic grouping.string
module to validate instances of string.Template
.An update has been made to the regular expression parser which adds support for atomic grouping. An atomic group is a bit like a nested regular expression, which is matched but then throws away any backtracking positions stored against any of the tokens within the group. To put it another way, the atomic group’s first match is always locked in, even if the rest of the string fails to match then there won’t be any backtracking to see whether the atomic group could have matched something different.
If you take /a(bc|b)c/
as an example regular expression, then this would normally match both abcc
, with the parenthesised group matching bc
, and also abc
, with the parenthesised group matching b
.
However, if we used the atomic group syntax (?>...)
, making the pattern /a(?>bc|b)c/
then this will still match abcc
but abc
will not match. This is because the parenthesised group matches the bc
of abc
, and since it’s an atomic group this is “locked in” so when the final c
fails to match then there’s no backtracking to try other possibilities.
>>> import re
>>>
>>> re.match("a(bc|b)c", "abc")
<re.Match object; span=(0, 3), match='abc'>
>>> re.match("a(bc|b)c", "abcc")
<re.Match object; span=(0, 4), match='abcc'>
>>>
>>> re.match("a(?>bc|b)c", "abcc")
<re.Match object; span=(0, 4), match='abcc'>
>>> re.match("a(?>bc|b)c", "abc")
>>>
A simpler form of this are possessive quantifiers, which have also been added in this release. You may already be aware that quantifiers (i.e. things like *
and +
) default to being greedy, which means they consume as much of the input string as possible before moving on within the pattern. If the match fails, they’ll then backtrack, but they start off on that basis. You can modify that to be lazy, however, which means they’ll match as little as possible before moving on. Again, they’ll still match more on backtracks, but the priority order of matching is reversed. To do this you add an additional ?
after the quantifier.
As of Python 3.11 you can instead append an additional +
after the quantifier to make it possessive. This acts just like a greedy quantifier, in that it matches as much as it can, but it doesn’t then backtrack to try to match less.
So why is this useful? Well, in principle one application may be in optimising the performance of expressions which might incur significant backtracking effort before finally failing to match. Let’s take the example of /\b(return|retry|re)\b/
matched against the string returns
. Once the parser has matched return
, but then failed because it’s not followed by a word boundary, then logically we know that neither of the others can match — they’re both shorter than return
and aren’t prefixes of it. However, the regex parser lacking such logic would try to backtrack and retry all the possibilities unnecessarily.
A note of caution, however: I’ve not found this always borne out by experience. I can only assume that either some overhead of using atomic groups outweighs the performance benefits in simple cases, or the matcher has some optimisation which does something similar automatically and more efficiently behind the scenes. You can see below the supposedly more efficient atomic grouping version is actually slower.
>>> setup_str = "import re; regex = re.compile(r'\b(return|retry|re)\b')"
>>> timeit("regex.search('returns')", setup=setup_str)
0.10913390596397221
>>> setup_str = "import re; regex = re.compile(r'\b(?>return|retry|re)\b')"
>>> timeit("regex.search('returns')", setup=setup_str)
0.19278872106224298
I was able to find cases where possessive quantifiers made a positive difference, though, so I think there’s potential value here — but you probably have to be fairly experienced to reliably tell the cases where it’ll help vs hinder. Personally I tend to use regular expressions as a tool of last resort for parsing anyway, so I’m doubtful that I’ll ever use these particular features — but there’s certainly no harm in being aware of them just in case.
>>> setup_str = "import re; regex = re.compile(r'<[^>]*>')"
>>> timeit("regex.match('<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa')", setup=setup_str)
0.36812841705977917
>>> setup_str = "import re; regex = re.compile(r'<[^>]*+>')"
>>> timeit("regex.match('<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa')", setup=setup_str)
0.32666016498114914
When people need string templating, they’ll often turn to third party modules such as Jinja. As I’ve mentioned in a previous article I’m a big fan of Jinja, but its power and complexity are overkill in simple cases. For these cases it’s easy to forget that there’s string.Template
sitting right there in the standard library.
>>> from string import Template
>>> template = Template("$first's on first, $second's on second, $third's on third.")
>>> template.substitute({"first": "Who", "second": "What", "third": "I Don't Know"})
"Who's on first, What's on second, I Don't Know's on third."
It’s always been a little light on features, as a matter of design, but in Python 3.11 it has a couple of additional methods which could be useful. The first of these is get_identifiers()
which lists the parameters to the template — i.e. those keys which the dict
passed to substitute()
would be expected to supply.
>>> template.get_identifiers()
['first', 'second', 'third']
The second method is is_valid()
, which checks to see whether the template has any errors which would cause a call to substitute()
to raise a ValueError
. The snippet below illustrates an example of an invalid template.
>>> bad_template = Template("$person owes me $100")
>>> bad_template.is_valid()
False
>>> bad_template.substitute({"person": "Andy"})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/string.py", line 121, in substitute
return self.pattern.sub(convert, self.template)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/string.py", line 118, in convert
self._invalid(mo)
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/string.py", line 101, in _invalid
raise ValueError('Invalid placeholder in string: line %d, col %d' %
ValueError: Invalid placeholder in string: line 1, col 17
These methods will likely prove particularly useful where templates are defined in one part of the code but substituted in another. In particular, if a library accepts a template as a parameter then these functions will allow the library to do some basic sanity checks of the template that’s been passed before it’s used. For example, if you use a template to define the format of a warning email to send to customers, it’s nice to know it has a bug in it before a situation where such emails need to be sent.
The second, and final, area of the standard library changes we’ll look at in this article are a few changes around data types.
datetime
has a new convenience alias for UTC zone, and more flexible parsing of ISO 86001 formats.enum
has a whole raft of assorted changes including a new StrEnum
class, improvements to string representations, and decorators for validation and conversion.The changes to datetime
are fairly simple, particularly the first one — you can now use datetime.UTC
as an alias for datetime.timezone.utc
. Because UTC is used so commonly, this simple change is actually quite convenient.
The second change is around the fromisoformat()
methods of date
, time
and datetime
classes. Previously these methods had a fairly limited scope which was to parse any format that the isoformat()
methods would generate. As of Python 3.11, they should accept any of the formats defined in ISO 8601, with the exception of those which use fractional hours and minutes — those if you who hate the idea you might need to say “15 seconds” instead of “0.25 minutes” are in for disappointment, I’m afraid.
One aspect of the parsing that’s a little more flexible than the ISO standard is that the T
separator can be any non-numeric character — in particular a space will also work, which is pretty helpful. However, it must be a single character, so if you’re passing strings supplied by a user, you may still want to normalise them a little.
>>> import datetime
>>> datetime.UTC
datetime.timezone.utc
>>> datetime.date.fromisoformat("2023-01-15")
datetime.date(2023, 1, 15)
>>> datetime.time.fromisoformat("13:15:25.112")
datetime.time(13, 15, 25, 112000)
>>> datetime.datetime.fromisoformat("2023-01-15T13:15:25.112Z")
datetime.datetime(2023, 1, 15, 13, 15, 25, 112000, tzinfo=datetime.timezone.utc)
>>> datetime.datetime.fromisoformat("2023-01-15T13:15+0100")
datetime.datetime(2023, 1, 15, 13, 15, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
>>> datetime.datetime.fromisoformat("2023-01-15 13:15")
datetime.datetime(2023, 1, 15, 13, 15)
>>> datetime.datetime.fromisoformat("2023-01-15 13:15")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2023-01-15 13:15'
The enum
module has had a lot of love in this release, with a number of improvements in different areas. It’s rather hard to summarise them in some unifying manner, so I’ll just jump right in.
StrEnum
and ReprEnum
¶The class hierarchy within enum
has changed a little in this release, with the specialisations using a new ReprEnum
class. Whereas Enum
uses its own representations for both the __repr__()
and __str__()
of instances, ReprEnum
only uses __repr__()
from the base Enum
— it leaves the __str__()
representation the same as the specialised type it’s representing (e.g. int
for IntEnum
). This allows enumeration values to behave more like their real types in more situations.
There’s also a new StrEnum
specialisation for when enumeration values should be treated as strings. Whereas a normal Enum
will allow its values to be usable as strings, they’re not stored as such natively. StrEnum
, however, always makes its values strings, and will raise TypeError
if any of its values are not strings or trivially coerced to strings.
The snippet below shows various ways in which Enum
and StrEnum
differ in their behaviour.
>>> import enum
>>> class NormalEnum(enum.Enum):
... ONE = "one"
... TWO = ("two",)
...
>>> class StringEnum(enum.StrEnum):
... ONE = "one"
... TWO = ("two",)
...
>>> NormalEnum.TWO
<NormalEnum.TWO: ('two',)>
>>> StringEnum.TWO
<StringEnum.TWO: 'two'>
>>> str(NormalEnum.ONE)
'NormalEnum.ONE'
>>> str(StringEnum.ONE)
'one'
>>> NormalEnum("two")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 715, in __call__
return cls.__new__(cls, value)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 1131, in __new__
raise ve_exc
ValueError: 'two' is not a valid NormalEnum
>>> StringEnum("two")
<StringEnum.TWO: 'two'>
>>>
>>> class Foo(enum.Enum):
... ONE = 123
...
>>> class Foo(enum.StrEnum):
... ONE = 123
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 558, in __new__
raise exc
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 259, in __set_name__
enum_member = enum_class._new_member_(enum_class, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 1276, in __new__
raise TypeError('%r is not a string' % (values[0], ))
TypeError: 123 is not a string
The values of StrEnum
are usable in most situations where str
is accepted, provided code is well-behaved and uses isinstance()
to check for subclasses of str
. If code checks specifically for str
itself, of course, it will fail.
>>> isinstance(StringEnum.ONE, str)
True
>>> type(StringEnum.ONE) == str
False
Flag
and IntFlag
Boundaries¶The Flag
enumeration is used where you need a single value which represents the state of multiple binary flags — this is what you’d use a bit field for in some other languages.
The change for this class in Python 3.11 is the addition of a boundary
class parameter to indicate how out-of-range values should be handled — i.e. if bits are set which don’t correspond to any of the items defined in the enumeration. The valid values for this parameter are defined by the enum.FlagBoundary
enumeration, and are as follows:
STRICT
ValueError
is raised.CONFORM
Flag
enumeration.EJECT
int
and returned as such.KEEP
IntFlag
.The snippet below demonstrates these cases.
>>> class StrictEnum(enum.Flag, boundary=enum.STRICT):
... ONE = enum.auto()
...
>>> StrictEnum(2**2)
Traceback (most recent call last):
# (Traceback removed for brevity)
ValueError: <flag 'StrictEnum'> invalid value 4
given 0b0 100
allowed 0b0 001
>>>
>>> class ConformEnum(enum.Flag, boundary=enum.CONFORM):
... ONE = enum.auto()
...
>>> ConformEnum(1 + 2**2 + 2**3)
<ConformEnum.ONE: 1>
>>>
>>> class EjectEnum(enum.Flag, boundary=enum.EJECT):
... ONE = enum.auto()
...
>>> EjectEnum(1 + 2**2)
5
>>>
>>> class KeepEnum(enum.Flag, boundary=enum.KEEP):
... ONE = enum.auto()
...
>>> KeepEnum(1 + 2**2 + 2**3)
<KeepEnum.ONE|12: 13>
@verify
¶There’s a new @verify
decorator which can impose specific constraints on the enumeration you declare. The constraint you impose is controlled by passing a parameter to the decorator which is a member of the EnumCheck
enumeration. The values, and the conditions they impose, are:
UNIQUE
CONTINUOUS
NAMED_FLAGS
Flag
and IntFlag
, ensures that for any values which refer to multiple flags (i.e. a value with multiple bits set), all of the values correspond to valid members of the enumeration.These three cases are demonstrated below by showing cases which fail their validations.
>>> @enum.verify(enum.UNIQUE)
... class MyEnum(enum.Enum):
... ONE = 1
... TWO = 2
... EIN = 1
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 1827, in __call__
raise ValueError('aliases found in %r: %s' %
ValueError: aliases found in <enum 'MyEnum'>: EIN -> ONE
>>>
>>> @enum.verify(enum.CONTINUOUS)
... class MyEnum(enum.Enum):
... ONE = 1
... TWO = 2
... FOUR = 4
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 1848, in __call__
raise ValueError(('invalid %s %r: missing values %s' % (
ValueError: invalid enum 'MyEnum': missing values 3
>>>
>>> @enum.verify(enum.NAMED_FLAGS)
... class MyEnum(enum.Flag):
... ONE = 1
... TWO = 2
... THREE = 4
... FOUR = 8
... ONE_AND_TWO = 3
... UNKNOWN_AND_FOUR = 24
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.11.1/lib/python3.11/enum.py", line 1881, in __call__
raise ValueError(
ValueError: invalid Flag 'MyEnum': alias UNKNOWN_AND_FOUR is missing value 0x10 [use enum.show_flag_values(value) for details]
@member
and @nonmember
¶Typically, every attribute of an enumeration subclass is converted to a member of the enumeration. However, there may be circumstances where you want to define, say, nested classes or other attributes.
I’ll be honest, I struggled to think of concrete cases where this would be useful, since I don’t tend to pile additional functionality into my enumerations — they’re almost always just bare classes which are members of another class or module which provides the related functionality. But I think it’s good to broaden your horizons, so in Python 3.11 it’s possible to specify whether or not a given attribute should be a member of the enumeration using the member()
and nonmember()
functions.
This would probably take five times as long to explain as demonstrate, so hopefully the snippet below will make things a little clearer. But even if it doesn’t, I suspect the chances are good that you’ll never need this anyway.
>>> class MyEnum(enum.Enum):
... MEMBER_ITEM = 100
... NON_MEMBER_ITEM = enum.nonmember(200)
...
>>> MyEnum.MEMBER_ITEM
<MyEnum.MEMBER_ITEM: 100>
>>> MyEnum.NON_MEMBER_ITEM
200
>>> MyEnum.__members__
mappingproxy({'MEMBER_ITEM': <MyEnum.MEMBER_ITEM: 100>})
@property
¶Similar to the builtin @property
decorator, there’s a new @enum.property
decorator specific for enumeration classes. The purpose of this is to define properties in a way which won’t clash with enumeration members, even if the names are the same. The only requirement is that the two definitions don’t occur in the same class — the property is defined in a base class.
Similar to @nonmember
, I struggle a little to think of real world cases where you’d want to merge enumeration functionality into another class, but if that sort of thing floats your boat then you can see an example of its use below.
>>> class MyBaseEnum(enum.Enum):
... @enum.property
... def ATTR(self):
... return 123
...
>>> class MyDerivedEnum(MyBaseEnum):
... ATTR = 456
...
>>> MyDerivedEnum.ATTR
<MyDerivedEnum.ATTR: 456>
>>> x = MyDerivedEnum(456)
>>> x.ATTR
123
@global_enum
¶Another change in this release is the addition of the @global_enum
decorator. This is intended for cases where the enumeration values will be promoted to module-level names, and it modifies the repr()
and str()
results to be consistent with this. You can see the difference in the output in the snippet below.
>>> class MyNormalEnum(enum.Enum):
... ONE = 1
... TWO = 2
...
>>> repr(MyNormalEnum.ONE)
'<MyNormalEnum.ONE: 1>'
>>> str(MyNormalEnum.TWO)
'MyNormalEnum.TWO'
>>>
>>> @enum.global_enum
... class MyGlobalEnum(enum.Enum):
... ONE = 1
... TWO = 2
...
>>> repr(MyGlobalEnum.ONE)
'__main__.ONE'
>>> str(MyGlobalEnum.TWO)
'TWO'
When members of an enum.Flag
class cover multiple bits, it’s reasonable to assume you might be able to treat them a bit like a frozenset
of enumeration members. In support of this, Python 3.11 adds support for len()
, list()
, set()
and membership tests on these values, as demonstrated in the snippet below.
>>> class MyFlagEnum(enum.Flag):
... FOO = 1
... BAR = 2
... BAZ = 4
... ALL_ITEMS = 7
...
>>> len(MyFlagEnum.ALL_ITEMS)
3
>>> set(MyFlagEnum.ALL_ITEMS)
{<MyFlagEnum.BAZ: 4>, <MyFlagEnum.BAR: 2>, <MyFlagEnum.FOO: 1>}
>>> list(MyFlagEnum.ALL_ITEMS)
[<MyFlagEnum.FOO: 1>, <MyFlagEnum.BAR: 2>, <MyFlagEnum.BAZ: 4>]
>>> MyFlagEnum.FOO in MyFlagEnum.ALL_ITEMS
True
>>> MyFlagEnum.BAR not in MyFlagEnum.BAZ
True
Phew, that was a strong finish on the enum
module changes — I was beginning to wonder if that should have had its own whole article…!
Still, despite some of the enum
changes being a little obscure, there’s some useful changes buried in there. Looking back over the rest of the article, I must admit the feature that I’m most pleased to see are the improvements to datetime
— writing code to parse dates is really dull, so I’m happy to see the need for that becoming less likely.
Next time I’ll finish off the look at Python 3.11 by covering the remaining changes of interest in the standard library.