The second of my two articles covering features added in Python 3.3, this one talks about a large number of changes to the standard library, especially in network and OS modules. I also discuss implicit namespace packages, which are a bit niche but can be useful for maintaining large families of packages.
This is the 4th of the 34 articles that currently make up the “Python 3 Releases” series.
This is the second and final article in this series looking at new features in Python 3.3. and we’ll be primarily drilling into a large number of changes to the Python libraries. There’s a lot of interesting stuff to cover the Internet side such as the new ipaddress
module and changes to email
, and also in terms of OS features such as a slew of new POSIX functions that have been exposed.
There are a few module changes relating to networking and Internet protocols in this release.
There’s a new ipaddress
module for storing IP addresses, as well as other related concepts like subnets and interfaces. All of the types have IPv4 and IPv6 variants, and offer some useful functionality for code to deal with IP addresses generically without needing to worry about the distinctions. The basic types are listed below.
IPv4Address
& IPv6Address
ip_address()
utility function constructs the appropriate one of these from a string specification such as 192.168.0.1
or 2001:db8::1:0
.IPv4Network
& IPv6Network
ip_network()
utility function constructs one of these from a string specification such as 192.168.0.0/28
or 2001:db8::1:0/56
. One thing to note is that because this represents an IP subnet rather than any particular host, it’s an error for any of the bits to be non-zero in the host part of the network specification.IPv4Interface
& IPv6Interface
ip_interface()
utility function constructs this from a string specification such as 192.168.1.20/28
. Note that unlike the specification passed to ip_network()
, this has non-zero bits in the host part of the specification.The snippet below demonstrates some of the attributes of address objects:
>>> import ipaddress
>>> x = ipaddress.ip_address("2001:db8::1:0")
>>> x.packed
b' \x01\r\xb8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00'
>>> x.compressed
'2001:db8::1:0'
>>> x.exploded
'2001:0db8:0000:0000:0000:0000:0001:0000'
>>>
>>> x = ipaddress.ip_address("192.168.0.1")
>>> x.packed
b'\xc0\xa8\x00\x01'
>>> x.compressed
'192.168.0.1'
>>> x.exploded
'192.168.0.1'
This snippet illustrates a network and how it can be used to iterate over the addresses within it, as well as check for address membership in the subnet and overlaps with other subnets:
>>> x = ipaddress.ip_network("192.168.0.0/28")
>>> for addr in x:
... print(repr(addr))
...
IPv4Address('192.168.0.0')
IPv4Address('192.168.0.1')
# ... (12 rows skipped)
IPv4Address('192.168.0.14')
IPv4Address('192.168.0.15')
>>> ipaddress.ip_address("192.168.0.2") in x
True
>>> ipaddress.ip_address("192.168.1.2") in x
False
>>> x.overlaps(ipaddress.ip_network("192.168.0.0/30"))
True
>>> x.overlaps(ipaddress.ip_network("192.168.1.0/30"))
False
And finally the interface can be queried for its address and netmask, as well retrieve its specification either as a netmask or in CIDR notation:
>>> x = ipaddress.ip_interface("192.168.0.25/28")
>>> x.network
IPv4Network('192.168.0.16/28')
>>> x.ip
IPv4Address('192.168.0.25')
>>> x.with_prefixlen
'192.168.0.25/28'
>>> x.with_netmask
'192.168.0.25/255.255.255.240'
>>> x.netmask
IPv4Address('255.255.255.240')
>>> x.is_private
True
>>> x.is_link_local
False
Having implemented a lot of this stuff manually in the past, having them here in the standard library is definitely a big convenience factor.
The email
module has always attempted to be compliant with the various MIME RFCs3. The email ecosystem is a broad church, however, and sometimes it’s useful to be able to customise certain behaviours, either to work on email held in non-compliance offine mailboxes or to connect to non-compliant email servers. For these purposes the email
module now has a policy framework.
The Policy
object controls the behaviour of various aspects of the email
module. This can be specified when constructing an instance from email.parser
to parse messages, or when constructing an email.message.Message
directly, or when serialising out an email using the classes in email.generator
.
In fact Policy
is an abstract base class which is designed to be extensible, but instances must provide at least the following properties:
Property | Default | Meaning |
---|---|---|
max_line_length |
78 |
Maximum line length, not including separators, when serialising. |
linesep |
"\n" |
Character used to separate lines when serialising. |
cte_type |
"7bit" |
If 8bit used with a BytesGenerator then non-ASCII may be used. |
raise_on_defect |
False |
Raise errors during parsing instead of adding them to defects list. |
So, if you’ve ever found yourself sick of having to remember to override linesep="\r\n"
in a lot of different places or similar, this new approach should be pretty handy.
However, one of the main motivations to introducing this system is it now allows backwards-incompatible API changes to be made in a way which enables authors to opt-in to them when ready, but without breaking existing code. If you default to the compat32
policy, you get an interface and functionality which is compatible with the old pre-3.3 behaviour.
There is also an EmailPolicy
, however, which introduces a mechanism for handling email headers using custom classes. This policy implements the following controls:
Property | Default | Meaning |
---|---|---|
refold_source |
long |
Controls whether email headers are refolded by the generator. |
header_factory |
See note4 | Callable that takes name and value and returns a custom header object for that particular header. |
The classes used to represent headers can implement custom behaviour and allow access to parsed details. Here’s an example using the default
policy which implements the EmailPolicy
with all default behaviours unchanged:
>>> from email.message import Message
>>> from email.policy import default
>>> msg = Message(policy=default)
>>> msg["To"] = "Andy Pearce <andy@andy-pearce>"
>>> type(msg["To"])
<class 'email.headerregistry._UniqueAddressHeader'>
>>> msg["To"].addresses
(Address(display_name='Andy Pearce', username='andy', domain='andy-pearce'),)
>>>
>>> import email.utils
>>> msg["Date"] = email.utils.localtime()
>>> type(msg["Date"])
<class 'email.headerregistry._UniqueDateHeader'>
>>> msg["Date"].datetime
datetime.datetime(2021, 3, 1, 17, 18, 21, 467804, tzinfo=datetime.timezone(datetime.timedelta(0), 'GMT'))
>>> print(msg)
To: Andy Pearce <andy@andy-pearce>
Date: Mon, 01 Mar 2021 17:18:21 +0000
These classes will handle aspects such as presenting Unicode representations to code, but serialising out using UTF-8 or similar encoding, so the programmer no longer has to deal with such complications, provided they selected the correct policy.
On a separate email-related note, the smtpd
module now also supports RFC 5321, which adds an extension framework to allow optional additions to SMTP; and RFC 1870, which offers clients an ability to pre-delcare the size of messages before sending them to detect errors earlier before sending a lot of data needlessly.
The smtplib
module also has some improvements. The classes now support a source_address
keyword argument to specify the source address to use for binding the outgoing socket, for servers where there are multiple potential interfaces and it’s important that a particular one is used. The SMTP
class can now act as a context manager, issuing a QUIT
command disconnecting when the context expires.
Also on the Internet-related front there were a handful of small enhancements to the ftplib
module.
ftplib.FTP
Now Accepts source_address
FTP_TLS.ccc()
FTP_TLS
class, which is a subclass of FTP
which adds TLS support as per RFC 4217, has now acquired a ccc()
method which reverts the connection back to plaintext. Apparently, this can be useful to take advantage of firewalls that know how to handle NAT with non-secure FTP without opening fixed ports. So now you know.FTP.mlsd()
mlsd()
method has been added to FTP
objects which uses the MLSD
command specified by RFC 3659. This offers a better API than FTP.nlst()
, returning a generator rather than a list
and includes file metadata rather than just filenames. Not all FTP servers support the MLSD
command, however.The http
, html
and urllib
packages also or some love in this release.
BaseHTTPRequestHandler
Header Bufferinghttp.server.BaseHTTPRequestHandler
server now html.parser.HTMLParser
Now Parses Invalid Markupstrict
parameter of the constructor as well as the now-unused HTMLParseError
have been deprecated.html.entities.html5
Addeddict
that maps entity names to the equivalent characters, for example html5["amp;"] == "&"
. This includes all the Unicode characters too. If you want the full list, take a peek at §13.5 of the HTML standard.urllib.Request
Method Specificationurllib.Request
class now has a method
parameter which can specify the HTTP method to use. Previously this was decided automatically between GET
and POST
based on whether body data was provided, and that behaviour is still the default if the method isn’t specified.sendmsg()
and recvmsg()
cmsg
man page.PF_CAN
Supportsocket
class now supports the PF_CAN
protocol family, which I don’t pretend to know much about but is an open source stack contributed by Volkswagen which bridges the Controller Area Network (CAN) standard for implementing a vehicle communications bus into the standard sockets layer. This one’s pretty niche, but it was just too cool not to mention5.PF_RDS
SupportPF_RDS
which is the Reliable Datagram Sockets protocol. This is a protocol developed by Oracle which offers similar interfaces to UDP but offers guaranteed in-order delivery. Unlike TCP, however, it’s still datagram-based and connectionless. You now know at least as much about RDS as I do. If anyone knows why they didn’t just use SCTP, which already seems to offer them everything they need, let me know in the comments.PF_SYSTEM
SupportPF_SYSTEM
. This is a MacOS-specific set of protocols for communicating with kernel extensions6.sethostname()
Addedsethostname()
updates the system hostname. On Unix system this will generally require running as root
or, in the case of Linux at least, having the CAP_SYS_ADMIN
capability.socketserver.BaseServer
Actions Hookservice_actions()
method every time around the main poll loop. In the base class this method does nothing, but derived classes can implement it to perform periodic actions. Specifically, the ForkingMixIn
now uses this hook to clean up any defunct child processes.ssl
Module Random Number GenerationRAND_bytes()
and RAND_pseudo_bytes()
. However, os.urandom()
is still preferable for most applications.ssl
Module ExceptionsSSLZeroReturnError
, SSLWantReadError
, SSLWantWriteError
, SSLSyscallError
and SSLEOFError
.SSLContext.load_cert_chain()
Passwordsload_cert_chain()
method now accepts a password
parameter for cases where the private key is encrypted. It can be a str
or bytes
value containing the actual password, of a callable which will return the password. If specified, this overrides OpenSSL’s default password-prompting mechanism.ssl
Supports Additional Algorithmscompression()
method to query the current compression algorithm in use. The SSL context also now supports an OP_NO_COMPRESSION
option to disable compression.ssl
Next Protocol Negotiationssl.SSLContext.set_npn_protocols()
has been added to support the Next Protocol Negotiation (NPN) extension to TLS. This allows different application-level protocols to be specified in preference order. It was originally added to support Google’s SPDY, and although SPDY is now deprecated (and superceded by HTTP/2) this extension is general in nature and still useful.ssl
Error IntrospectionInstances of ssl.SSLError
now have two additional attributes:
library
is a string indicating the OpenSSL subsystem responsible for the error (e.g. SSL
, X509
).reason
is a string code indicating the reason for the error (e.g. CERTIFICATE_VERIFY_FAILED
).A few new data structures have been added as part of this release.
There’s a new types.SimpleNamespace
type which can be used in cases where you just want to hold some attributes. It’s essentially just a thin wrapper around a dict
which allows the keys to be accessed as attributes instead of being subscripted. It’s also somewhat similar to an empty class definition, except for three main advantages:
types.SimpleNamespace(a=1, xyz=2)
.repr()
which follows the usual guideline that eval(repr(x)) == x
.dict
, unlike the default equality of classes, which compares by the result of id()
.There’s a new collections.ChainMap
class which can group together multiple mappings to form a single unified updateable view. The class overall acts as a mapping, and read lookups are performed across each mapping in turn with the first match being returned. Updates and additions are always performed in the first mapping in the list, and note that this may mask the same key in later mappings (but it will leave the originally mapping intact).
>>> import collections
>>> a = {"one": 1, "two": 2}
>>> b = {"three": 3, "four": 4}
>>> c = {"five": 5}
>>> chain = collections.ChainMap(a, b, c)
>>> chain["one"]
1
>>> chain["five"]
5
>>> chain.get("ten", "MISSING")
'MISSING'
>>> list(chain.keys())
['five', 'three', 'four', 'one', 'two']
>>> chain["one"] = 100
>>> chain["five"] = 500
>>> chain["six"] = 600
>>> list(chain.items())
[('five', 500), ('one', 100), ('three', 3), ('four', 4), ('six', 600), ('two', 2)]
>>> a
{'five': 500, 'six': 600, 'one': 100, 'two': 2}
>>> b
{'three': 3, 'four': 4}
>>> c
{'five': 5}
There are a whole host of enhancements to the os
, shutil
and signal
modules in this release which are covered below. I’ve tried to be brief, but include enough useful details for anyone who’s interested but not immediately familiar.
os.pipe2()
Addedpipe2()
call is now available. This allows flags to be set on the file descriptors thus created atomically at creation. The O_NONBLOCK
flag might seem the most useful, although it’s for O_CLOEXEC
(close-on-exec
) where the atomicity is really essential. If you open a pipe and then try to set O_CLOEXEC
separately, it’s possible for a different thread to call fork()
and execve()
between these two, thus leaving the file descriptor open in the resultant new process (which is exactly what O_CLOEXEC
is meant to avoid).os.sendfile()
Addedsendfile()
system call is now also available. This allows a specified number of bytes to be copied directly between two file descriptors entirely within the kernel, which avoids the overheads of a copy to and from userspace that read()
and write()
would incur. This useful for, say, static file HTTP daemons.os.get_terminal_size()
Addedsys.stdout
by default, to obtain the window size of the attached terminal. On Unix systems (at least) it probably uses the TIOCGWINSZ
command with ioctl()
, so if the file descriptor isn’t attached to a terminal I’d expect you’d get an OSError
due to inappropriate ioctl()
for the device. There’s a higher-level shutil.get_terminal_size()
discussed below which handles these errors, so it’s probably best to use that in most cases.Bugs and security vulnerabilities can result from the use of symlinks in the filesystem if you implement the pattern of first obtaining a target filename, and then opening it in a different step. This is because the target of the symlink may be changed, either accidentally or maliciously, in the meantime. To avoid this, various os
functions have been enhanced to deal with file descriptors instead of filenames, which avoids this issue. This also offers improved performance.
Firstly, there’s a new os.fwalk()
function which is the same as os.walk()
except that it takes a directory file descriptor as a parameter, with the dir_fd
parameter, and instead of the 3-tuple return it returns a 4-tuple of (dirpath, dirnames, filenames, dir_fd)
. Secondly, many functions now support accepting a dir_fd
parameter, and any path names specified should be relative to that directory (e.g. access()
, chmod()
, stat()
). This is not available on all platforms, and attempting to use it when not available will raise NotImplementedError
. To check support, os.supports_dir_fd
is a set
of the functions that support it on the current platform.
Thirdly, many of these functions also now support a follow_symlinks
parameter which, if False
, means they’ll operate on the symlink itself as opposed to the target of the symlink. Once again, this isn’t always available you risk getting NotImplementedError
if you don’t check the function is in os.supports_follows_symlinks
.
Finally, some functions now also support passing a file descriptor instead of a path (e.g. chdir()
, chown()
, stat()
). Support is optional for this as well and you should check your functions are in os.supports_fd
.
os.access()
With Effective IDseffective_ids
parameter which, if True
, checks access using the effective UID/GID as opposed to the real identifiers. This is platform-dependent, check os.supports_effective_ids
, which once again is a set()
of methods.os.getpriority()
& os.setpriority()
os.nice()
but for other processes too.os.replace()
Addedos.rename()
is to overwrite the destination on POSIX platforms, but raises an error on Windows. Now there’s os.replace()
which does the same thing but always overwrites the destination on all platforms.os.stat()
, os.fstat()
and os.lstat()
now support reading timestamps with nanosecond precision, where available on the platform. The os.utime()
function supposed updating nanosecond timestamps.os.getxattr()
, os.listxattr()
, os.removexattr()
and os.setxattr()
. These are key/value pairs that can be associated with files to attach metadata for multiple purposes, such as supporting Access Control Lists (ACLs). Support for these is platform-dependent, not just on the OS but potentially on the underlying filesystem in use as well (although most of the Linux ones seem to support them).os
module now allows access to the sched_*()
family of functions whic control CPU scheduling by the OS. You can find more details on the sched
man page.Support for some additional POSIX filesystem and other operations was added in this release:
lockf()
applies, tests or removes POSIX filesystem locks from a file.pread()
and pwrite()
read/write from a specified offset within a current file descriptor but without changing the current file descriptor offset.readv()
and writev()
provide scatter/gather read/write, where a single file can be read into, or written from, multiple separate buffers on the application side.truncate()
] truncates or extends the specified path to be an exact size. If the existing file was larger, excess data is lost; if it was smaller, it’s padded with nul characters.posix_fadvise()
allows applications to declare an intention to use a specific access pattern on a file, to allow the filesystem to potentially make optimisations. This can be an intention for sequential access, random access, or an intention to read a particular block so it can be fetched into the cache.posix_fallocate()
reserves disk space for expansion of a particular file.sync()
flushes any filesystem caches to disk.waitid()
is a variant of waitpid()
which allows more control over which child process state changes to wait for.getgrouplist()
returns the list of group IDs to which the specified username belongs.os.times()
and os.uname()
Return Named Tuplestuple
return types, this allows results to be accessed by attribute name.os.lseek()
in Sparse Fileslseek()
now supports additional options for the whence
parameter, os.SEEK_HOLE
and os.SEEK_DATA
. These start at a specified offset and find the nearest location which either has data, or is a hole in the data. They’re only really useful in sparse files, because other files have contiguous data anyway.stat.filemode()
Addedos
module, but since the stat
module is a companion to os.stat()
I thought it most appropriate to cover here. An undocumented function tarfile.filemode()
has exposed as stat.filemode()
, which convert a file mode such as 0o100755
into the string form -rwxr-xr-x
.shlex.quote()
Addedpipes
module, but it was previously undocumented. It escapes all characters in a string which might otherwise have special significance to a shell.shutil.disk_usage()
Addedos.statvfs()
, but this wrapper is more convenient and also works on Windows, which doesn’t provide statvfs()
.shutil.chown()
Now Accept Namesshutil.get_terminal_size()
AddedCOLUMNS
and LINES
are defined, they’re used. Otherwise, os.get_terminal_size()
(mentioned above) is called on sys.stdout
. If this fails for any reason, the fallback values passed as a parameter are returned — these default to 80x24 if not specified.shutil.copy2()
and shutil.copystat()
Improvementsshutil.move()
Symlinksmv
does, re-creating the symlink instead of copying the contents of the target file when copying across filesystems, as used to be the previous behaviour. Also now also returns the destination path for convenience.shutil.rmtree()
Securitydir_fd
in os.open()
and os.unlink()
, it’s now used by shutil.rmtree
to avoid symlink attacks.pthread_sigmask()
allows querying and update of the signal mask for the current thread. If you’re interested in more details of the interactions between threads and signals, I found this article had some useful examples.pthread_kill()
sends a signal to a specified thread ID.sigpending()
is for examining the signals which are currently pending on the current thread or the process as a whole.sigwait()
and sigwaitinfo()
both block until one of a set of signals becomes pending, with the latter returning more information about the signal which arrived.sigtimedwait()
is the same as sigwaitinfo()
except that it only waits for a specified amount of time.signal.set_wakeup_fd()
to allow signals to wake up code waiting on file IO events (e.g. using the select
module), the signal number is now written as the byte into this FD, whereas previously simply a nul byte was written regardless of which signal arrived. This allows the handler of that polling loop to determine which signal arrived, if multiple are being waited on.OSError
Replaces RuntimeError
in signal
signal.signal()
and signal.siginterrupt()
, they now raise OSError
with an errno
attribute, as opposed to a simple RuntimeError
previously.subprocess
Commands Can Be bytes
subprocess.DEVNULL
AddedSeveral of the objects in threading
used to be factory functions returning instances, but are now real classes and hence may be subclassed. This change includes:
threading.Condition
threading.Semaphore
threading.BoundedSemaphore
threading.Event
threading.Timer
threading.Thread
Constructor Accepts daemon
daemon
keyword parameter has been added to the threading.Thread
constructor to override the default behaviour of inheriting this from the parent thread.threading.get_ident()
Exposed_thread.get_ident()
is now exposed as a supported function threading.get_ident()
, which returns the thread ID of the current thread.The time
module has several new functions which are useful. The first three of these are new clocks with different properties:
time.monotonic()
time.perf_counter()
time.monotonic()
but has the higest available resolution on the platform.time.process.time()
time.get_clock_info()
This function returns details about the specified clock, which could be any of the options above (passed as a string) or "time"
for the details of the time.time()
standard system clock. The result is an object which has the following attributes:
adjustable
is True
if the clock may be changed by something external to the process (e.g. a system administrator or an NTP daemon).implementation
is the name of the underlying C function called to provide the timer value.monotonic
is True
if the clock is guaranteed to never go backwards.resolution
is the resolution of the clock in fractional seconds.The time
module also has also exposed the following underlying system calls to query the status of various system clocks:
clock_getres()
returns the resolution of the specified clock, in fractional seconds.clock_gettime()
returns the current time of the specified clock, in fractional seconds.clock_settime()
sets the time on the specified clock, if the process has appropriate privileges. The only clock for which that’s supported currently is CLOCK_REALTIME
.The clocks which can be specified in this release are:
time.CLOCK_REALTIME
is the standard system clock.time.CLOCK_MONOTONIC
is a monotonically increasing clock since some unspecified reference point.time.CLOCK_MONOTONIC_RAW
provides access to the raw hardware timer that’s not subject to adjustments.time.CLOCK_PROCESS_CPUTIME_ID
counts CPU time on a per-process basis.time.CLOCK_THREAD_CPUTIME_ID
counts CPU time on a per-thread basis.time.CLOCK_HIGHRES
is a higher-resolution clock only available on Solaris.This is a feature which is probably only of interest to a particular set of package maintainers, so I’m going to do my best not to drill into too much detail. However, there’s a certain level of context required for this to make sense — you can always skip to the next section if it gets too dull!
First I should touch on what’s a namespace package in the first place. If you’re a Python programmer, you’ll probably be aware that the basic unit of code reusability is the module1. Modules can be imported individually, but they can also be collected into packages, which can contain modules or other packages. In its simplest forms, a module is a single .py
file and a package is a directory which contains a file called __init__.py
. The contents of this script are executed when the package is important, but the very fact of the file’s existence is what tags it as a packge to Python, even if the file is empty.
So now we come to what on earth is a namespace package. Simply put, this is a logical package which presents a uniform name to be imported within Python code, but is physically split across multiple directories. For example, you may want to create a machinelearning
package, which itself contains other packages like dimensionreduction
, anomolydetection
and clustering
. For such a large domain, however, each of those packages is likely to consist of its own modules and subpackages, and have its own team of maintainers, and coordinating some common release strategy and packaging system across all those teams and repositories is going to be really painful. What you really want to do is have each team package and ship its own code independently, but still have them presented to the programmer as a uniform package. This would be a namespace package.
Python already had two approaches for doing this, one provided by setuptools
and later another one provided by the pkgutil
module in the standard library. Both of these rely on the namespace package providing some respective boilerplate __init__.py
files to declare it as a namespace package. These are shown below for reference, but I’m not going to discuss them further because this section is about the new approach.
# The setuptools approach involves calling a function in __init__.py,
# and also requires some changes in setup.py.
__import__('pkg_resources').declare_namespace(__name__)
# The pkgutil approach just has each package add its own directory to
# the __path__ attribute for the namespace package, which defines the
# list of directories to search for modules and subpackages. This is
# more or less equivalent to a script modifying sys.path, but more
# carefully scoped to impact only the package in question.
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
Both of these approaches share some issues, however. One of them is that when OS package maintainers (e.g. for Linux distributions) want somewhere to install these different things, they’d probably like to choose the same place, to keep things tidy. But this means all those packages are going to try and install an __init__.py
file over the top of each other, which makes things tricky — the OS packaging system doesn’t know these files necessarily contain the same things and will generate all sorts of complaints about the conflict.
The new approach, therefore, is to make these packages implicit, where there’s no need for an __init__.py
. You can just chuck some modules and/or sub-packages into a directory which is a subdirectory of something on sys.path
and Python will treat that as a package and make the contents available. This is discussed in much more detail in PEP 420.
Beyond these rather niche use-cases of mega-packages, this feature seems like it should make life a little easier creating regular packages. After all, it’s quite common that you don’t really need any setup code in __init__.py
, and creating that empty file just feels messy. So if we don’t need to these days then why bother?
Well, as a result of this change it’s true that regular packages can be created without the need for __init__.py
, but the old approach is still the correct way to create a regular package, and has some advantages. The primary one is that omitting __init__.py
is likely to break existing tools which attempt to search for code, such as unittest
, pytest
and mypy
to name just a few. It’s also noteworthy that if you rely on namespace packages and then someone adds something to your namespace which contains an __init__.py
, this ends the search process for the package in question since Python assumes this is a regular package. This means all your other implicit namespace packages will be suddenly hidden when the clashing regular package is installed. Using __init__.py
consistently everywhere avoids this problem.
Furthermore, regular packages can be imported as soon as they’re located on the path, but for namespace packages the entire path must be fully processed before the package can be created. The path entries must also be recalcuated on every import, for example in case the user has added additional entries to sys.path
which would contribute additional content to an existing namespace package. These factors can introduce performance issues when importing namespace packages.
There are also some more minor factors which favour regular packages which I’m including below for completeness but which I doubt will be particularly compelling for many people.
__file__
attribute and the __path__
attribute is read-only. These aren’t likely a major issue for anyone, unless you have some grotty code which it trying to calculate paths relative to the source files in the package or similar.setuptools.find_packages()
function won’t find these new style namespace packages, although there is now a setuptools.find_namespace_packages()
function which will, so it should be a fairly simple issue to modify setup.py
appropriately.As a final note, if you are having any issues with imports, I strongly recommend checking out Nick Coghlan‘s excellent article Traps for Unware in Python’s Import System which discusses some of the most common problems you might run into.
There are a set of small but useful changes in some of the builtins that are worth noting.
open()
Openeropener
for open()
calls which is callable which is invoked with arguments (filename, flags)
and is expected to return the file descriptor as os.open()
would. This can be used to, for example, pass flags which aren’t supported by open()
, but still benefit from the context manager behaviour offered by open()
.open()
Exclusivelyx
mode was added for exclusive creation, failing if the file already exists. This is equivalent to the O_EXCL
flag to open()
on POSIX systems.print()
Flushingprint()
now has a flush
keyword argument which, if set to True
, flushes the output stream immediately after the output.hash()
Randomizationstr.casefold()
str
objects now have a casefold()
method to return a casefolded version of the string. This is intended to be used for case-insensitive comparisons, and is a much more Unicode-friendly approach than calling upper()
or lower()
. A full discussion of why is outside the scope of this article, but I suggest the excellent article Truths Programmers Should Know About Case by James Bennett for an informative article about the complexities of case outside of Latin-1 languages. Spoiler: it’s harder than you think, which should always be your default assumption for any I18n issues2.copy()
and clear()
copy()
and clear()
methods on both list
and bytearray
objects, with the obvious semantics.range
Equalityrange
objects based on equality of the generated values. For example, range(3, 10, 3) == range(3, 12, 3)
. However, bear in mind this doesn’t evaluate the actual contents so range(3) != [0, 1, 2]
. Also, applying transformations such as reversed
seems to defeat these comparisons.dict.setdefault()
enhancementdict.setdefault()
resulted in two hash lookups, one to check for an existing item and one for the insertion. Since a hash lookup can call into arbitrary Python code this meant that the operation was potentially non-atomic. This has been fixed in Python 3.3 to only perform the lookup once.bytes
Methods Taking int
count()
, find()
, rfind()
, index()
and rindex()
of bytes
and bytearray
objects now accept an integer in the range 0-255
to specify a single byte value.memoryview
changesmemoryview
class has a new implementation which fixes several previous ownership and lifetime issues which had lead to crash reports. This release also adds a number of features, such as better support for multi-dimensional lists and more flexible slicing.There were some other additional and improved modules which I’ll outline briefly below.
bz2
RewrittenThe bz2
module has been completely rewritten, adding several new features:
bz2.open()
function, which supports opening files in binary mode (where it operates just like the bzip2.BZ2File
constructor) or text mode (where it applies an io.TextIOWrapper
).bz2.BZ2File
using the fileobj
parameter.io.BufferedIOBase
interface is now implemented by bz2.BZ2File
, except for detach()
and truncate()
.collections.abc
collections
. Alises still exist at the top-level, however, to preserve backwards-compatibility.crypt.mksalt()
crypt.mksalt()
function to create the 2-character salt used by Unix passwords.datetime
ImprovementsThere are a few enhancements to the ever-useful datetime
library.
datetime
objects used to raise TypeError
, but it was decided this was inconsistent with the behaviour of other incomparable types. As of Python 3.3 this will simply return False
instead. Note that other comparisons will still raise TypeError
, however.datetime.timestamp()
method to return an epoch timestamp representation. This is implicitly in UTC, so timezone-aware datetime
s will be converted and naive datetime
s will be assumed to be in the local timezone and converted using the platform’s mktime()
.datetime.strftime()
now supports years prior to 1000 CE.datetime.astimezone()
now assumes the system time zone if no parameters are passed.decimal
Rewritten in Cdecimal
module using the high-performance libmpdec. There are some API changes as a result which I’m not going to go into here as I think most of them only impact edge cases.functools.lru_cache()
Type Segregationfunctools.lcu_cache
class for caching function results based on the parameters. This caching was based on checking the full set of arguments for equality with previous ones specified, and if they all compared equal then the cached result would be returned instead of calling the function. In this release, there’s a new typed
parameter which, if True
, also enforces that the arguments are of the same type to trigger the caching behaviour. For example, calling a function with 3
and then 3.0
would return the cached value with typed=False
(the default) but would call the function twice with typed=True
.importlib
importlib.__import__
is now used directly by __import__()
. A number of other changes have had to happen behind the scenes to make this happen, but now it means that the import machinery is fully exposed as part of importlib
which is great for transparency and for any code which needs to find and import modules programmatically. I considered this a little niche to cover in detail, but the release notes have some good discussion on it.io.TextIOWrapper
Buffering Optionalio.TextIOWrapper
has a new write_through
optional argument. If set to True
, write()
calls are guaranteed not to be buffered but will be immediately passed to the underlying binary buffer.itertools.accumulate()
Supports Custom Functionfunc=operator.mul
would give a running product of values.logging.basicConfig()
Supports Handlershandlers
parameter on logging.basicConfig()
which takes an iterable of handlers to be added the root logger. This is probably handy for those scripts that are just large enough to be worth using logging
, particularly if you consider the code might one day form the basis of a reusable module, but which aren’t big enough to mess around setting up a logging configuration file.lzma
Addedxz
utility. This library supports the .xz
file format, and also the .lzma
legacy format used by earlier versions of this utility.math.log2()
Addedmath.log(x, 2)
, this will often be faster and/or more accurate than the existing approach, which involves the usual division of logs to convert the base.pickle
Dispatch Tablespickle.Pickler
class constructor now takes a dispatch_table
parameter which allows the pickling functions to be customised on a per-type basis.sched
ImprovementsThe sched
module, for generalised event scheduling, has had a variety of improvements made to it:
run()
can now be passed blocking=False
to execute pending events and then return without blocking. This widens the scope of applications which can use the module.sched.scheduler
can now be used safely in multithreaded environments.sched.scheduler
constructor now have sensible defaults.enter()
and enterabs()
methods now no longer require the argument
parameter to be specified, and also support a kwargs
parameter to pass values by keyword to the callback.sys.implementation
sys.implementation
attribute which holds information about the current implementation being used. A full list of the attributes is beyond the scope of this article, but as one example sys.implementation.version
is a version tuple in the same format as sys.version_info
. The former contains the implmentation version whereas the latter specifes the Python language version implemented — for CPython the two will be the same, since this is the reference implementation, but for cases like PyPy the two will differ. PEP 412 has more details.tarfile
Supports LZMAlzma
module mentioned above.textwrap
Indent Functionindent()
method allows a prefix to be added to every line in a given string. This functionality has been in the textwrap.TextWrapper
class for some time, but is now exposed as its own function for convenience.xml.etree.ElementTree
C Extensionxml.etree.cElementTree
, although that module remains for backwards compatibility.zlib
EOFzlib
module now has a zlib.Decompress.eof
attribute which is True
if the end of the stream has been reached. If this is False
but there is no more data, it indicates that the compressed stream has been truncated.As usual, there were some minor things that struck me as less critical, but I wanted to mention nonetheless.
bytes
literalsstr
literals are written r"..."
and bytes
literals are b"..."
. Until previously combining these required br"..."
, but as of Python 3.3 rb"..."
will also work. Rejoice in the syntax errors thus avoided.u"..."
literals are once again supported for str
objects. This has no semantic significance in Python 3 since it is the default..py
files when double-clicked. It even checks the shebang line to determine the Python version to use, if multiple are available.dict
implementation used for holding attributes of objects has been updated to allow it to share the memory used for the key strings between multiple instances of a class. This can save 10-20% on memory footprints on heavily object-oriented code, and increased locality also achieves some modest performance improvements of up to 10%. PEP 412 has the full details.So that’s Python 3.3, and what a lot there was in it! The yield from
support is handy, but really just a taster of proper coroutines that are coming in future releases with the async
keyword. The venv
module is a bit of a game-changer in my opinion, because now that everyone can simply rely on it being there we can do a lot better documenting and automating development and runtime setups of Python applications. Similarly the addition of unittest.mock
means everyone can use the powerful mocking features it provides to enhance unit tests without having to add to their project’s development-time dependencies. Testing is something where you want to lower the barrier to it as much as you can, to encourage everyone to use it freely.
The other thing that jumped out to me about this release in particular was the sheer breadth of new POSIX functions and other operating system functionality that are now exposed. It’s always a pet peeve of mine when my favourite system calls aren’t easily exposed in Python, so I love to see these sweeping improvements.
So all in all, no massive overhauls, but a huge array of useful features. What more could you ask from a point release?
This could be pure Python or an extension module in a another langauge like C or C++, but that distinction isn’t important for this discussion. ↩
Or if you really want the nitty gritty, feel free to peruse §3.13 of the Unicode standard. But if you do — and with sincere apologies to the authors of the Unicode standard who’ve forgotten more about international alphabets than I’ll ever know — my advice is to brew some strong coffee first. ↩
Well, since you asked that’s specifically RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and RFC 2049. ↩
The default header_factory
is documented in the email.headerregistry
module. ↩
And let’s be honest, my “niche filter” is so close to the identity function that they could probably share a lawnmower. I tend to only miss out the things that apply to only around five people, three of whom don’t even use Python. ↩
However, since KEXTs have been replaced with system extensions more recently, which run in user-space rather than in the kernel, then I don’t know whether the PF_SYSTEM
protocols are going to remain relevant for very long. ↩