I was slow to make the transition from Python 2 to 3 in the first place, and I never felt like I kept up properly with the new features. So I’m going to aim to do a series of articles looking at a different Python version in each and go through the new features added and catch myself up properly. This one addresses features added in Python 3.0 beyond those already in 2.6, including Unicode by default, type annotations, and exception chaining.
This is the 1st of the 34 articles that currently make up the “Python 3 Releases” series.
I’ve always had a fondness for Python, since I started using it back in around 2005 or so. The only scripting I’d done before that was Perl and PHP, which had always felt totally unsuited to anything beyond the most trivial scripts1. Python always felt like a breath of fresh air: the minimal core language harked back to the compactness of C, the first language I learned after BASIC, and compared to anything else at the time the standard library felt less like batteries included and more like a sizeable power station.
In those heady days I kept up with the releases keenly, checking out every new feature as it was added. I remember the slightly giddy glee as I built lazily evaluated lists with generator comprehensions, and even trivial features like adding timeouts to various blocking operations gave me a little thrill. Perhaps I didn’t have enough going on in my life…
Alas, the release of Python 3.0 coincided with me having less and less time to devote to such detailed following, and before you know it it’s well over a decade later and there’s a mountain of new features with which I don’t feel intimately familiar. Of course, I’ve accumulated some bits and pieces of knowledge and experience on some of the more major aspects as I’ve gone along, but there’s nothing quite like a detailed trawl through the full release notes to pick up on handy tricks and features one might have missed.
This is the first in a series of articles where I’ll attempt to (very belatedly!) catch myself up on the latest and greatest. I’m going to start by examining all the major new features in Python 3.0 in this article, and then take a new release in each subsequent one until I’m all caught up with 3.92. I may not go through every little change, especially in the standard library, but I’ll try to cover all the changes that noteworthy.
Since the potential scope of this article is all of the changes from Python 2.6 to 3.0, it’s probably going to be the one that’s most likely to make me wish I’d broken it up into smaller pieces. So let’s get going before I change my mind, and dive into python 3!
We’ll start with one of the most straight-forward changes: the keyword print
became the functioni print()
. This was at the same time both pleasing, as it was one less irksome special case to worry about; but also slightly vexing, as my muscle memory took quite some time to adjust. Still, it was absolutely the right thing to do, and I for one certainly haven’t missed the inconsistent use of the >>
operator for redirecting print
output.
On a rather more substantial note, many of the functions that used to return lists now return iterators. This addressed some rather ugly duplication such as having both range()
and xrange()
, and dict
having both items()
and iteritems()
. In most cases you want an iterator anyway, so you don’t have to buffer up potentially large lists in memory, and if you really do want a real list
you can just pass the iterator to the list()
constructor. If you’re doing that just to sort it, though, then sorted()
probably does what you want.
So far so simple, but there’s some interesting detail here for dict
which may not be immediately apparently. The three methods dict.keys()
, dict.values()
and dict.items()
don’t actually return just simple iterators but instead return views. Unlike a generator these can be iterated repeatedly, and furthermore they remain linked with the original dict
such that updates to the original will immediately be reflected in the view. That said, they do suffer the usual limitation that the dict
can’t change size while it’s being iterated or it’ll raise a RuntimeError
.
The functions map()
and filter()
were both updated to return iterators instead of lists, although many of the cases where you might be tempted to use these are more readably implemented as a list comprehension or generator expression instead.
Finally, xrange()
was renamed to range()
to replace the original list version, and zip()
also returns an iterator.
These changes mean that it only takes a little care and you can structure your code as almost entirely lazily-evaluated iterators to build up some pretty complex processing chains with a minimum of complexity in the code. I was a big fan of all these changes.
One change that has a bit more potential to break things was that the ordering comparison operators <
, <=
, >
and >=
will now raise TypeError
where there is no obvious natural order. For example, 3 > None
will error out, as will 2 < "3"
. This means that heterogenous lists can’t necessarily be sorted, as not all the elements are likely to be comparable to each other. That said, it’s hard to think of any such cases which aren’t the result of poor design somewhere.
A related change that took me a little while to adjust to was the loss of the cmp
parameter to sorted()
and list.sort()
. Instead the key
parameter can be used to supply a function to convert the value to a “sort equivalent” value. In general any sensible cases are quite easy to convert between these two, but it took a bit of a shift in thinking at times.
Both cmp()
and __cmp__()
also vanished (well, were deprecated at least). The former was mostly only useful in building comparison methods to pass as the cmp
parameter to sorting functions anyway, so with these removed the need for cmp()
was basically gone.
The loss of the __cmp__()
method was a bit more irritating as implementing the rich comparison operators such as __lt__()
and __ge__()
gets a little tedious if you want to implement several classes which are support all six such methods. These days the functools.total_ordering
decorator makes this rather less cumbersome, but that wasn’t added until Python 3.2 so let’s not get ahead of ourselves. At least !=
returns the inverse of ==
by default, unless the latter returns NotImplemented
3.
Whilst I understand why __cmp__()
was deprecated in favour of the rich comparison operators, as it’s important to support partially ordered types for some cases, this is one case where I feel a little conflicted as __cmp__()
was genuinely useful for the common case of fully ordered classes.
A number of changes impacted the int
and long
types which are worth being familiar with as arithmetic is the bread and butter of so much code.
Firstly, the long
type is no more, as it’s been rolled into a unified int
type whose backend storage is seamlessly converted as necessary. This was very pleasant as it always felt oddly low-level for Python to have exposed C’s confusingly inconsistent int
size across platforms. In the same vein, sys.maxint
was removed as there is effectively no longer a limit to the value of an int
. However, it’s still sometimes useful to have a value larger than the size of any container and sys.maxsize
was retained for this. Generally I’d say having None
removes most of the need for this, but it’s there if you find it useful.
Several types have new ways to specify literals or other expressions.
The new literals for octal (e.g. 0o644
) and binary (e.g. 0b11001
) were added in Python 2.6, but now the old form of octal literals (e.g. 0644
) is no longer supported. There’s a new bin()
function to convert an integer to a binary string, similar to oct()
for octal and hex()
and hexadecimal.
There’s also a new more concise format for set
literals which is {1, 2, 3, 4}
. Note, however, that {}
is still an empty dict
so you’ll need to use set()
for that case. This can also be used as set comprehension, as in {i**2 for i in range(10) if i % 2 == 0}
.
In a similar vein there’s also a format for dict
comprehensions, so you can do things like this totally not in any way contrived example:
1 2 3 |
|
Now we’re getting to what is, in my opinion, one of the most impactful changes in Python 3.
In Python 2, there were two types: str
represented a sequence of bytes, and unicode
represented a series of Unicode code points, which was independent of any particular encoding. Since data invariably comes into an application as bytes, there was always a bit of a dance where everything coming in should immediately have been decoded to unicode
, using some out-of-band mechanism to determine the correct encoding to use; and all data being sent out had to be re-encoded, again using some particular encoding as required.
Whilst this all seems simple enough, in practice it lead to an awful lot of bugs. Typically programmers learned the str
type and didn’t know anything about unicode
until later, after the use of str
was baked into all sorts of awkward parts of their system. Even if the programmer was knowledgeable enough to know to convert to unicode
everywhere, there’s often not enough information available to select the best encoding both on input and output. The bugs created by this sort of issue often wouldn’t manifest until much later, as an application was exposed to users from other countries, leading to lots of highly painful bugs found in production setups.
Python 3 can’t solve all of these issues, but what it does do is force the programmer to deal with them rather more explicitly. This is done by renaming the unicode
type to str
, and storing bytes in a new bytes
type which is also immutable. The key point is that, unlike in Python 2, you can’t mix these types. Previously things would all work for simple ASCII cases, where Python 2 would translate between str
and unicode
as required. This is where the “found in production” bugs creep in, unless you’re rigorous in your test cases. In Python 3, however, if you attempt to mix these types you’ll invariably get an exception. This forces the programmer to do explicit conversions when reading or writing byte-oriented storage (by which I mean, pretty much all storage).
Along with this change, u"..."
literals ceased to be valid, as there is no longer a unicode
type, and standard "..."
literals are now interpreted as Unicode str
types. To specify bytes explicitly, a new b"..."
literal was added. The old basestring
, which used to be a base class for both str
and unicode
, has been removed as it no longer makes sense to treat str
and bytes
interchangably with the new cleaner distinction between them.
The upshot is that all code now needs to be written to be totally explicit about whether it’s dealing with text, which has no definite size in bytes, or bytes, which is just a sequence of 8-bit values that has no specific interpretation as text. It’s important to remember that this isn’t a problem that Python 3 has created for programmers, it’s simply one that everyone should have always dealt with but has managed to avoid early on, only to cause undue pain later. Trust me, I speak from bitter experience of retrofitting i18n to a fairly large system, it’s not something you’ll thank your past self for.
One last note on all this is that there’s also a bytearray
builtin type which is a mutable version of the bytes
type. This was actually added in Python 2.6, so is strictly outside the scope of this article, but it’s given new relevance with the bytes
type added in Python 3.0 and also the semantics deserve some clarification.
Essentially, if you index this type, you’ll get an int
in the range 0-255, and if you want to set something you’ll need to pass one. Unlike Python 2.x, you can’t pass a str
or bytes
for this purpose any more4. However, if you want to extend it then you’ll need to pass a bytes
or another bytearray
. You can’t pass a str
, since this isn’t composed of bytes but a set of unicode code points; you’d need to choose an encoding and convert to a bytes
first.
>>> x = bytearray()
>>> x += "Knights who say"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytearray
>>> x += b"Knights who say"
>>> x.append(b"N")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'bytes' object cannot be interpreted as an integer
>>> x.append(32)
>>> x.append(78)
>>> x.append(105)
>>> x
bytearray(b"Knights who say Ni")
>>> x[2]
105
All these changes to support unicode are particularly relevant when dealing with files. The first thing to note is that the mode in which you open your file is now more relevant than it used to be on some platforms.
To recap, open()
takes a filename as its first parameter and a mode as its second. Everyone is probably used to specifying this as r
to read, w
to write (creating a new file or truncating any existing file to zero length) or a
to append (open for write but don’t truncate). You may or may not be aware there’s also an x
mode, which creates a new file and opens for writing, raising FileExistsError
if that file already exists. On top of any of these you can add +
to open for both reading and writing, with other behaviours such as creating or truncating the file remaining the same. Finally you can add a b
to open the file in binary mode, or a t
for text mode, although since this is the default most people omit it.
In python 2 the difference between binary and text was essentially whether line ending conversion was done on platforms which used CRLF conventions (i.e. Windows) and Unix users typically didn’t need to worry about the distinction. In Python 3 the difference is more pronounced: if you open a file in text mode then expect to use str
objects for read and write, whereas in binary mode you need to use bytes
object. As per earlier discussion these are not interchangable, you must remain consistent with the mode you’ve used or expect exceptions.
Opening a file in binary mode is straightforward, as you’d expect. After all, an 8-bit value is an 8-bit value anywhere in the world. The interesting cases come with text mode — since Python is always dealing with bytes when it talkes to the OS, it always needs some sort of encoding to do this transation in both directions. It’s important to bear in mind this is true regardless of whether you’ve supplied such an encoding.
Ideally you know what encoding to choose, either because it’s been supplied with the text out-of-band (e.g. in a HTTP header), or because the file you’re reading it supposed to follow some pre-defined standard which fixes the encoding. In this case, you can supply the encoding
parameter to open()
and all is good. If you don’t, the system default encoding will be used as per locale.getpreferredencoding()
, which is likely to work in many cases but definitely cannot be relied upon.
One other option is to use the same encoding detection that Python itself uses to read source files via tokenize.detect_encoding()
5. This will look for a BOM or special cookies as defined in PEP 263. That said, in my experience it’s pretty rare for content to contain such helpful markers on many platforms.
A final note on a feature which is actually present already in Python 2.6 but I don’t know how many people are aware of it: there’s a newline
parameter to control newline translation for files opened in text mode. This defaults to None
which enables universal newlines mode, which translates system-specific line-endings6 into \n
on input and does the reverse translation on output. This means that any of the builtin functions that deal with lines will respect any of the possible line endings, and is a sensible default. However, there may be times you need to generate files which may be read on platforms other than your own, and in these cases you have a few other choices.
If you pass newline=""
to open()
then universal newline mode is enabled to detect newlines on input, but the line endings will be passed to you without translation. Passing this value on output will disable any translation of line endings during write. Passing any of \n
, \r
or \r\n
as the value of newlines
will treat that sequence as the only valid line-ending character to respect for reading this file and it will be translated to and from \n
if required on input and output.
All this said, it seems like we need to make sure our code handles encoding/decoding errors gracefully if we care about our application’s stability. If you read or write bytes, there’s no way the content can be invalid, but if you’re reading in (say) UTF-8 then there are byte sequences which are simply invalid. Unless you want any user submitting content to be able to crash your application with an unhandled exception, you’d better do something about it.
So what to do? The first option is to simply make sure you handle UnicodeEncodeError
and UnicodeDecodeError
gracefully anywhere you’re doing the conversion, either explicitly or implicitly. These are both usefully subclasses of UnicodeError
, itself a subclass of ValueError
, so there’s several layers of granularity you can use.
This policy of raising exceptions on encoding/decoding errors can be changed, however, by supplying the errors
parameter to open()
and some of the other functions which interact with the codecs
module. By default this is strict
, which means to raise the exceptions, but in Python 3.0 you can also supply any of the following7:
strict
ignore
replace
?
to replace bad characters and the official U+FFFD
character on decoding.xmlcharreplace
backslashreplace
Later versions of Python add a couple more options, which I’ll try and remember to discuss in the appropriate article, but the full list for any version can be found in the documentation for the codecs
module.
Truly the Unicode cup runneth over in Python 3, there are yet a few more wrinkles to iron out. All the discussion so far as talked about file content, but files also have names and these are strings — how does Unicode affect these?
In Python 3 these are generally treated as str
so that filenames with arbitrary Unicode characters are permitted. However, this can cause problems on platforms where filenames are instead arbitrary byte strings, and so there may not be a valid translation to Unicode. As a result, many of the APIs that accept filenames as str
will also accept them as bytes
, and sometimes this can change their behaviour. For example, os.listdir()
normally returns a list
of str
, but if you pass a bytes
parameter then you’ll get a list
of bytes
instead. Some functions also have bytes
alternatives, such as os.gwtcwdb()
which is like os.getcwd()
in every respect except that it returns bytes
instead of str
.
All this sounds rather tedious, I’ll be honest, and just treating everything as Unicode sounds much more pleasant to me. My strong advice to anyone is to keep control of the filenames you need to deal with so that you never run into these issues. If you really need to support user-specified filenames (e.g. you’re building your own cloud storage offering) then store these as metadata in some database and generate the filename yourself as (e.g.) the SHA-256 of the file content.
It’s worth mentioning that this issue doesn’t necessarily just occur with filenames, but also things like os.environ
and sys.argv
. In general I suspect that there’s not a lot to be done about these cases except fail early and obviously so the user can take corrective action before they’ve wasted too much time.
There were some changes to the way function parameters and return values are specified.
The first of these is annotations, specified in PEP 3107. These don’t make any functional change at runtime, but these annotations can be used by third party tools to perform type-checking or other functions. The annotations can be any Python expression, such as strings for documentation:
1 2 3 |
|
Or it could be types, for tools that do data flow analysis and type checking:
def my_function(filename: str, encoding: str, block_size: int):
We’ll come back to this feature more in the discussion of Python 3.5 where the typing
module was added.
As well as annotations, there were some changes in PEP 3102 to allow keyword arguments to be specified after varargs style. Imagine you want to write a function that takes a variable number of positional parameters, but you also want to allow keyword arguments to specify optional flags. In Python 2 your only option was this:
1 2 |
|
Python 3 introduced a small syntax tweak to allow you to do this:
1 2 |
|
This effectively makes the parameters after the varargs keyword-only. But what if you want to do this without actually allowing varargs positional arguments? You can do this with a bare *
, which won’t accept any parameters like *args
, but will still flag any remaining arguments as keyword-only:
1 2 |
|
There are a number of changes to tidy up the use of exceptions in Python 3, make life easier for everyone.
Firstly, it’s not mandatory that exception classes are derived (directly or indirectly) from BaseException
. This was always a good idea, it just wasn’t mandatory until now. That said, you almost certainly want to derive from Exception
instead, as BaseException
is generally reserved for things that you actually want to bypass your handles and proceed to the outer scope — SystemExit
, KeyboardInterrupt
and GeneratorExit
. Believe me, you don’t want the pain of tracking down a bug where you’ve accidentally caught GeneratorExit
because you forgot it was implemented with an exception.
As per PEP 3109, exceptions must be constructed as other classes, so use raise Exception(args)
instead of the old raise Exception, args
which is no longer supported.
In a similar vein, PEP 3110 updated the syntax for catching exceptions so that except MyException, exc
is no longer valid, the cleaner syntax except MyException as exc
must now be used. Slightly more subtly, the scope of the variable to which the exception is bound is now limited to the handler itself.
On a more fundamental level, PEP 3134 adds exception chaining support. This generally occurs when an exception is raised in the handler of a previous exception. In Python 2 the current exception being handled was effectively a global singleton, so the original exception information was lost to be replaced by the second one. This was pretty annoying, since generally the traceback from the original exception is going to be the one that helps you track down the bug. A common case of this is where library owners “helpfully” wrap system exceptions in their own errors; this is great for encapsulation outside the library, but makes it really painful to track down bugs in the library itself.
With exception chaining, however, no exceptions are lost. Instead, the original exception is saved under the __context__
attribute of the new exception. This can occur several times in a row, hence the term “chaining”. As well as being available to application code via __context__
, the default exception logging also does a good job of presenting the full context:
>>> try:
... raise Exception("one")
... except Exception as exc1:
... try:
... raise Exception("two")
... except Exception as exc2:
... raise Exception("three")
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
Exception: one
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
Exception: two
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
Exception: three
As well as the implicit chaining of raising exceptions in a handler, it’s also possible to explicitly chain exceptions with raise NewException() from exc
. This is broadly similar, but stores the original exception under the __cause__
attribute instead of __context__
. This also subtly changes the output in the default handler:
>>> try:
... raise Exception("one")
... except Exception as exc:
... raise Exception("two") from exc
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
Exception: one
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
Exception: two
The same PEP also adds a __traceback__
attribute of exception objects for better encapsulation. This is cleaner than having to dig it out of sys.exc_info()
, especially where you now have multiple exceptions floating around at the same time.
Python 3.0 brings a new approach to string formatting which is described in PEP 3101. Even if you’ve never used Python 3 you may well already be familiar with it since it was actually added in Python 2.6. I’ll give it a brief run through here, though, since it’s mentioned in the Python 3.0 release notes.
One point I found amusing whilst going back through the old release notes was the certainty with which the %
operator was going to be deprecated in Python 3.1 and removed shortly after. Now we’re well over a decade and getting on for ten releases later and we still don’t seem to be any closer to removing it. I can understand why; I’m sure there’s a large amount of code which would need to be painstakingly updated, and it’s not something that lends itself to reliable automatic conversion.
In any case, the approach isn’t too dissimilar to the old printf()
syntax, but it’s more flexible. The simplest form is actually quite similar to the printf()
version except instead of %s
and the like then {n}
is used, where n
is the number of the argument to use, and instead of the %
operator the arguments are passed to the str.format()
method. More readably, names can be used instead of numbers along with keyword parameters to format()
.
Both of the following will produce the same output:
>>> "My name is {1} and I'm {0} years old".format(40, "Brian")
"My name is Brian and I'm 40 years old"
>>> "My name is {name} and I'm {age} years old".format(name="Brian", age=40)
"My name is Brian and I'm 40 years old"
On top of the basic field references, two explicit conversions are recognised, where !s
can be appended to convert the value with str()
and !r
can be appended to convert the value with repr()
. This overrides the default formatting for the type, and !r
is useful for (e.g.) diagnostic output.
The final item is a formatting specifier, which is of the form [[fill]align][sign][#][0][minimumwidth][.precision][type]
. Instead of duplicating the documentation by taking you through all the options, especially when you may well already be familiar, I’ll limit myself to some examples:
>>> # Exponent notation with precision 3.
>>> "{0:.3e}".format(123456789)
'1.235e+08'
>>> # General float format, min. length 8, precision 4.
>>> "{0:8.4}".format(12.345678)
' 12.35'
>>> # Hex format, centred in 10 chars using '\' as padding.
>>> "{0:\^+#10x}".format(1023)
'\\\\+0x3ff\\\\'
>>> # Format float as percentage, min. length 7, precision 2.
>>> "{0:7.2%}".format(0.123456789)
' 12.35%'
>>> # This requires locale to be set first...
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'en_GB.UTF-8'
>>> # Use locale-specific number separator.
>>> "{0:n}".format(1234567)
'1,234,567'
User-defined types can define a __format__()
method to override the formatting for them, in a similar way to the existing __str__()
and __repr__()
special methods. There are also ways to override the formattinig more globally, but that’s a little esoteric and outside the scope of this already rather too long article — let’s move on.
There have been some other changes to operators and special methods.
The special methods for slices __getslice__()
, __setslice__()
and __delslice__()
have been removed, which is a pleasant simplification of what’s becoming quite a massive set of special methods. Instead the standard __getitem__()
, __setitem__()
and __delitem__()
methods will be passed a slice
object containing whichever of start, stop and step size have been specified.
The next()
method has been renamed to __next__()
for consistency. A new builtin next()
has been added to call this method in the same way that iter()
already calls __iter__()
.
The __oct__()
and __hex__()
special methods have been removed in favour of just supplying __index__()
which returns an integer used to populate the results of bin()
, oct()
and hex()
.
The __members__
and __methods__
attributes are no longer supported, and attributes of the form func_X
have been renamed to __X__
for consistency and to avoid polluting the user-definable namespace. This specifically refers to __closure__
, __code__
, __defaults__
, __dict__
, __doc__
, __globals__
and __name__
.
Finally, __nonzero__
is now __bool__
.
This means the full list of special methods is now a little more manageable in Python 3.0 with a little less duplication. Still pretty busy, but that’s probably inevitably given the degree of flexibility Python offers:
__new__()
, __init__()
and __del__()
.__repr__()
, __str__()
and __format__()
.__lt__()
, __le__()
, __eq__()
, __ne__()
, __gt__()
and __ge__()
.__hash__()
.__bool__()
.__getattr__()
, __getattribute__()
, __setattr__()
, __delattr__()
and __dir__()
.__get__()
, __set__()
and __delete__()
.__slots__()
__call__()
__len__()
, __getitem__()
, __setitem__()
, __delitem__()
, __iter__()
, __reversed__()
and __contains__()
.__neg__()
, __pos__()
, __abs__()
and __invert__()
.__complex__()
, __int__()
, __float__()
and __round__()
.__index__()
.__enter__()
and __exit__()
.__add__()
, __sub__()
, __mul__()
, __truediv__()
, __floordiv__()
, __mod__()
, __divmod__()
, __pow__()
, __lshift__()
, __rshift__()
, __and__()
, __xor__()
and __or__()
.__radd__()
, __rsub__()
, __rmul__()
, __rtruediv__()
, __rfloordiv__()
, __rmod__()
, __rdivmod__()
, __rpow__()
, __rlshift__()
, __rrshift__()
, __rand__()
, __rxor__()
and __ror__()
.__iadd__()
, __isub__()
, __imul__()
, __itruediv__()
, __ifloordiv__()
, __imod__()
, __ipow__()
, __ilshift__()
, __irshift__()
, __iand__()
, __ixor__()
and __ior__()
.I realise that calling 80 methods “manageable” may seem a little far-fetched, but at least there’s been some progress.
Here’s a few more things that I wanted to mention, but didn’t seem to fit neatly into their own category.
Firstly, there’s a new scope. Variables are still bound in local scope by default, and global
still binds them to the global scope. Python 3 additionally adds the nonlocal
keyword as specified in PEP 3104. This means that a nested function can declare a variable as referring to an outer scope without being constrained to only the global scope. This finally brings proper nested scopes to Python as many other langauges already enjoy (C, JavaScript, Ruby, etc.).
There’s also a neat change to unpacking iterables, so you can embed a “rest” argument when unpacking which consumes any remaining arguments. This can be at the start, middle or end of the lvalue list:
>>> (first, second, *rest, last) = range(6)
>>> print(first, second, rest, last, sep='\n')
0
1
[2, 3, 4]
5
There’s a new version of super()
which can be invoked without arguments in a regular instance method inside a class definition, and it will automatically select the correct class and instance to call into. The behaviour of super()
with arguments supplied is unchanged.
Sticking with classes, if you’re a fan of metaclasses there’s a cleaner syntax for specifying them. Instead of the Python 2 version:
1 2 3 |
|
… you must now use the more concise and consistent Python 3 version:
1 2 |
|
The builtin raw_input()
has been renamed input()
to replace the original. This was always a dangerously tempting function for Python newbies who didn’t understand the stability and security implications of evaluating arbitrary user input in Python. If you really want the old behaviour, you can still eval(input())
(but you probably don’t).
There’s also been some tidying up of builtins
:
intern()
is now sys.intern()
.reload()
is now imp.reload()
.apply()
, instead of apply(func, args)
use func(*args)
.callable()
, use hasattr(func, "__call__")
.coerce()
, no longer required now old-style classes are gone.execfile()
, instead of execfile(filename)
use exec(open(filename).read())
.file
, now you must use open()
.reduce()
, use functools.reduce()
if you must, but it’s probably more readable just to use an explicit loop.dict.has_key()
, just use the in
operator.Some of the standard library moved around or was removed in Python 3, although there’s nothing I regard as particularly contraversial and I’m mostly just mentioning it for completeness. I’m not going to try to pick through everything but some examples that jumped out at me:
StringIO
ChangesStringIO
and cStringIO
modules are gone, replaced by io.StringIO
and io.BytesIO
for text and data respectively.md5
Module Removedmd5
module is gone, but hashlib
has all the functionality you need from it.bsddb3
No Longer Part of Standard Librarybsddb3
package was removed from the standard library, to ease maintenance burden, but it’s still available externally.ConfigParser
is now configparser
, Queue
is queue
, SocketServer
is socketserver
and so on.cPickle
should never be used directly, pickle
will choose the best available implementation at import time.Many modules have been grouped to keep things organised. Some examples:
http
now contains submodules httplib
, BaseHTTPServer
, CGIHTTPServer
, SimpleHTTPServer
, Cookie
and cookielib
.urllib
now contains submodules urllib
, urllib2
, urlparse
and robotparse
.xmlrpc
now contains submodules xmlrpclib
, DocXMLRPCServer
and SimpleXMLRPCServer
.sets
Module Removedsets
module is gone, as it’s no longer needed given that set()
and frozenset()
are builtins.string
string.letters
, string.lowercase
and string.uppercase
are all gone along with their locale-specific behaviour, to be replaced by the more consistently defined string.ascii_letters
, string.ascii_lowercase
and string.ascii_uppercase
.__builtin__
is now builtins
__builtin__
has been renamed to builtins
.If you’ve waded through all the above and made it down here, I salute your tenacity; and if you’ve just skipped down here in case there were some closing comments, I can’t really blame you in the slightest — it’s been a bit of a long slog, this one.
Overall, I feel the changes in Python 3.0 were very positive indeed. They took some great opportunities to tidy up some dirty corners and the switch to Unicode by default, whilst quite a pain for existing code, is the right decision overall. It would be nicer if the whole world could just settle on a single encoding and be done with it, but that’s probably somewhat outside the remit of the Python community.
There are a few aspects I’m more ambivalent about. The new string formatting approach is more flexible (although I little less performant, I’m given to understand) than the old %
formatting. However, it also lacks some of the convenience for simpler cases, and the fact that the logging
library still uses %
formatting under the hood is a bit inconsistent. The reason is that changing this would break the existing API and lots of code out there. At time of writing, I believe there’s still no entirely satisfactory solution to this issue.
The loss of __cmp__()
is also a bit of a blow, but I’m certainly not going to lose sleep over it. I can see that the new approach is more intuitive, it’s just also a bit more verbose and cumbersome in some cases.
I’m also glad I decided to run through this, beacuse I discovered quite a few things of which I was previously unaware, such as set comprehensions and overriding the Unicode error handling strategy. This is great because, given the long time that Python 3 has already been with us, it initially felt like reviewing the changes in Python 3.0 would be a bit of a waste of time.
I’m hoping the remaining articles in this series will be a little shorter and rather heavier on interesting new features and lighter on just tidying things up.
An opinion, it must be said, of which I’ve not been meaningfully disabused in the intervening years. ↩
Or based on my historical posting frequency, 3.10 might even have been released by that point! ↩
The NotImplemented
built-in constant, used only for the rich comparison methods, shouldn’t be confused with the NotImplementedError
exception, which is similar in purpose but used in other contexts. ↩
Although if you don’t mind a little hoop-jumping, and you really have a pressing desire to set a character using a bytes
object you could do so using slice notation. So whilst x[1] = b"a"
might not work, x[1:2] = b"a"
should do. ↩
From Python 3.2 onwards there’s a more convenient tokenize.open()
which you should generally use instead of directly calling detect_encoding()
yourself, but we’ll cover that in a couple of articles. ↩
In case you’re unaware, different operating systems use either \n
(e.g. Unix), \r\n
(e.g. Windows) or \r
(e.g. MacOS prior to OS X) as a line separator. You can check your OS-specific value with os.linesep
. ↩
Note that only strict
and ignore
apply to all encodings, the remainder only apply to text encodings; that is, those that translate between str
and bytes
. There are also a scattering of special encodings that are str
to str
or bytes
to bytes
, but I’m not going to discuss them further in this article further as they’re a little esoteric. ↩