In this series looking at features introduced by every version of Python 3, this is the fourth looking at Python 3.5. In it we look at the major updates to the standard library which were made in this release. These include various Internet protocol module enhancements, some asyncio features, and some restrictions on regular expression syntax have been lifted.
This is the 11th of the 34 articles that currently make up the “Python 3 Releases” series.
As you’ve perhaps read in the preceding three articles in this series, Python 3.5 had quite an impressive number of new features. However, this didn’t come at the expense of the usual widespread improvements to the standard library as well, and in this fourth and final article on Python 3.5 we’ll go through most of these.
There’s some further asyncio
improvements, a whole lot of improvements in networking modules, some handy changes in the pathlib
module, and more. But let’s kick off with some changes to containers.
There are a handful of improvements in the collections
, collections.abc
and enum
modules.
OrderedDict
ImprovementsFirstly OrderedDict
has been re-written in C for better performance. The release notes say this should improve its performance 4-100x, which basically means it should be faster, but exactly how much depends very much on what you’re doing with it.
Secondly, the items()
, keys()
and values()
views now support iteration with reversed()
.
deque
Now a MutableSequence
deque
class has added index()
, insert()
and copy()
methods and also supports +
and *
operators, which all together means it now fulfills the requirements of the MutableSequence
abstract base class. This means they can be used to replace list
in most contexts, which is handy because for FIFO-like use-cases deques are great data structures.collections.abc.Generator
was an oversight, and it’s good to see it corrected in this release. This becomes particularly relevant in 3.9 when the abstract base classes support direct use as type hints, but that’s quite a few articles away yet. There are also some other new abstract base classes for async support, namely Awaitable
, Coroutine
, AsyncIterator
, and AsyncIterable
.enum.Enum
Takes start
Parameterstart
parameter specifies the initial integer value.There’s a minor but useful change in the gzip
module to allow the x
character to be added to the mode
argument to request exclusive creation of the file.
In a similarly small fashion, lzma.LZMADecompressor.decompress()
now accepts an optional max_length
parameter to put an upper limit on the size of the decompressed data, so you don’t accidentally fill up your disk.
In the tarfile
module, open()
now accepts the x
mode modified to require exclusive creation of the file. Also, the TarFile.extract()
and Tarfile.extractAll()
methods now accept a boolean numeric_owner
parameter — if True
, the numeric UID and GID from the tarfile are used to set ownership of extracted files, as opposed to the normal behaviour of translating the names into whatever IDs are in use on the local system.
The zipfile
module now supports writing output to strems that don’t support seeking. Also, ZipFile.open()
now also accepts a x
specified on the mode string, to require exclusive creation of a new file.
Several of the modules for concurrency saw some incremental improvements, including asyncio
, concurrent.futures
and multiprocessing
There are a number of assorted improvements to asyncio
in this release, most of which are discussed below.
set_debug()
and get_debug()
methods on event loop objects to enable/disable additional runtime checks. These include better context on exception tracebacks to indicate the affected task, warnings when operations are performed which may not be thread-safe, and warnings when asynchronous calls take longer than a certain time to return.loop.is_closed()
Addedis_closed()
methods on event loops checks if the loop was closed yet.async()
Replacedasync()
function, ensure_future()
should now always be used.Task
objects, using the event loop’s new set_task_factory()
and get_task_factory()
methods. This factory will be used by the create_task()
method, and the factory must be a callable that takes two parameters: the event loop to which the task is being added, and the coroutine to form the body of the task.asyncio.Queue
asyncio.Queue
class is similar to that provided by the queue
module, except that the get()
and put()
methods are coroutines. In this release a new join()
coroutine method was added to block until the queue has no outstanding pending items. Note that this doesn’t just mean all items have been retrieved with get()
, it also requires them to be processed — this is indicated by consumers calling a new task_done()
method (not a coroutine) once each fetched task is complete.run_coroutine_threadsafe()
method to submit coroutines to an event loop safely from other threads.create_future()
to create new Future
objects, so alternative loop implementations can provide fast implementations. This method should be used going forward.StreamReader.readuntil()
AddedStreamReader
class, for working with network connections, now has a handy readuntil()
method which reads data up to and included a specified separator, and then removes that from the buffer and returns it. Handy for reading, say, HTTP headers up to the terminating \r\n\r\n
.There are a couple of changes to allow better performance. Firstly, Executor.map()
now offers a chunksize
parameter which allows the code to control the batch size when assigning tasks to child processes with ProcessPoolExecutor
. This is particularly convenient when tasks are quick, so the overhead of pushing them out individually would be considerable.
Secondly, the number of workers in the ThreadPoolExecutor
is now optional, and defaults to 5x the number of CPUs. This makes it simpler to write code that can make best use of available resources on any platform.
Some improvements to handy inspect
and logging
, as well as unittest
.
The BoundArguments
object represents the result of binding argument values to parameters from a Signature
object. In Python 3.5, this has acquired a new apply_defaults()
method which will add any default values for arguments not already bound in that instance. This is probably best demonstrated with an example.
>>> import inspect
>>> def my_func(arg1, arg2, arg3="three", arg4="four",
*args, kwarg1="eins"): pass
...
>>> signature = inspect.signature(my_func)
>>> bound_args = signature.bind("un", "deux", "troi")
>>> bound_args
<BoundArguments (arg1='un', arg2='deux', arg3='troi')>
>>> bound_args.apply_defaults()
>>> bound_args
<BoundArguments (arg1='un', arg2='deux', arg3='troi', arg4='four', args=(), kwarg1='eins')>
>>> bound_args.args
('un', 'deux', 'troi', 'four')
>>> bound_args.kwargs
{'kwarg1': 'eins'}
The signature()
function itself has also had a new follow_wrapped
optional keyword argument. This controls whether to follow the __wrapped__
attribute that decorators add to link to the wrapped function, which we discussed in the article on Python 3.2. This defaults to True
but you can now override it to False
if you want the actual callable you pass without following the __wrapped__
chain.
There are also some new functions to inspect coroutine objects and functions:
iscoroutine()
True
if the specified object is a coroutine, returned from a coroutine function.iscoroutinefunction()
True
if the specified function was defined with async def
.isawaitable()
True
if the specified object can be used as the target of an await
expression, which can be a coroutine of any object with an __await__()
method.getcoroutinelocals()
gengeneratorlocals()
, which was added in Python 3.3. In short, it returns a dict
of the current values of local variables within the specified coroutine.getcoroutinestate()
CORO_CREATED
, CORO_RUNNING
, CORO_SUSPENDED
and CORO_CLOSED
.All logging methods which produce a log entry accept an exc_info
attribute, which you can set to True
to include details of the currently handled exception (if any), or a 3-tuple of (type, value, traceback)
as returned by sys.exc_info()
to log details of the specified exception. The change in Python 3.5 is that you can now also just pass an instance of an exception to trigger this latter behaviour, which is convenient.
The handlers.HTTPHandler
class, which supports sending log messages to a web server via GET
or POST
, has been improved so that you can optionally pass an ssl.SSLContext
instance to configure SSL settings used for the HTTP connection. As remote logging is likely to be the sort of thing where you’d want some authentication and confidentiality, this seems like a useful change.
The handlers.QueueListener
is not actually a handler, but a companion to the handlers.QueueHandler
class. The QueueHandler
supports writing log messages to a queue, such as that provided by the queue
or multiprocessing
modules, and the QueueListener
watches this queue and processes the log messages so enqueued. This allows threads and other processes to quickly deal with generating log messages without the overhead of potential expensive handlers, and have the log messages dealt with asynchronously in a different thread.
This was actually added in Python 3.2, but the reason I’m mentioning it here is that in Python 3.5 a respect_handler_level
parameter has been added to its constructor. If True
, this class will filter log messages according to the threshold log level of each registered handler. Prior to this, every log message would always be passed to every handler.
In unittest
, the TestLoader.loadTestsFromModule()
takes an optional pattern
keyword argument. If a module defines a load_tests()
method, to customise how tests are loaded, the pattern
argument is passed as the third parameter to load_tests()
. This allows the set of tests to be loaded to be filtered.
Errors during discovery are now exposed as TestLoader.errors
, which could be useful for those running tests automatically as opposed to interactively. Also, when executing on the command-line there’s a new --locals
flag which includes local variables in backtraces, for easier diagnosis of the reasons for failures.
The unittest.mock
module also has some handy changes. First up is to address an irritating issue caused when you mistype an assertX()
method on a Mock
object. Because these objects manufacture attributes on demand, it will be treated as a regular method and the test will not raise a failure despite the error. That is until Python 3.5, because now any method name with a prefix of assert
(or assret
, a common typo) will cause an immediate AttributeError
. If you happen to have such methods that you legitimately want to mock, you can disable this behaviour by passing unsafe=True
to the Mock
constructor.
There’s a new assert_not_called()
method on Mock
, which is slightly more readable than manually asserting call_count
is zero. The MagicMock
has also been enhanced to support a few new special methods, namely __truediv__()
, __divmod__()
and __matmul()__
.
Some changes to make configparser
more flexible, some json
convenience changes, and some corners of re
filled out to improve its expressiveness even more.
The constructor of ConfigParser
now allows custom converters to be registered. You do this by passing the converters
parameter as a dictionary mapping type names into functions which take the string value as an argument and return the converted value.
These work the same way as the built-in getint()
, getfloat()
and getboolean()
methods in that ConfigParser
doesn’t try to guess the type of values or convert them on reading — instead, you choose the conversion when you query the value. Any exceptions raised during conversion will occur when you query the value using the accessor. If you register a type "foo"
then there’ll be a getfoo()
method, so be careful when choosing the names of your types in the dictionary.
Here’s an example:
>>> import configparser
>>> import ipaddress
>>> convs = {
... "intlist": lambda x: [int(i.strip()) for i in x.split(",")],
... "ipaddr": lambda x: ipaddress.ip_address(x)
... }
>>> parser = configparser.ConfigParser(converters=convs)
>>> parser.read_string("""
... [my_section]
... one_value = 2, 3, 5, 7, 11, 13, 17
... another_value = 10.1.200.15
... some_other_value = 2002:0a01:c80f::
... """)
>>> parser.get("my_section", "one_value")
'2, 3, 5, 7, 11, 13, 17'
>>> parser.getintlist("my_section", "one_value")
[2, 3, 5, 7, 11, 13, 17]
>>> parser.get("my_section", "another_value")
'10.1.200.15'
>>> parser.getipaddr("my_section", "another_value")
IPv4Address('10.1.200.15')
>>> parser.getipaddr("my_section", "some_other_value")
IPv6Address('2002:a01:c80f::')
>>> parser.getipaddr("my_section", "one_value")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.5.10/lib/python3.5/configparser.py", line 806, in _get_conv
**kwargs)
File "/Users/andy/.pyenv/versions/3.5.10/lib/python3.5/configparser.py", line 800, in _get
return conv(self.get(section, option, **kwargs))
File "<stdin>", line 3, in <lambda>
File "/Users/andy/.pyenv/versions/3.5.10/lib/python3.5/ipaddress.py", line 54, in ip_address
address)
ValueError: '2, 3, 5, 7, 11, 13, 17' does not appear to be an IPv4 or IPv6 address
A couple of little enhancements. Firstly, json.tool
preserves the ordering of keys in JSON objects now, nuless --sort-keys
is passed which will resort them lexicographically.
Secondly, decoding JSON now throws json.JSONDecodeError
instead of ValueError
for better context — however, since the former is a subclass of the latter, existing code that catches ValueError
should continue to work.
There are some changes with regards to matching groups in Python 3.5.
In Python 3.4 and prior, lookbehind assertions were not allowed to contain references to match groups. When compiling the pattern you’d get a warning, and then it would simply fail to match when used.
Python 3.4.10 (default, Mar 28 2021, 04:12:21)
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> pattern = re.compile(r"(ABC|XYZ)...(?<=\1)DEF")
/Users/andy/.pyenv/versions/3.4.10/lib/python3.4/sre_parse.py:361: RuntimeWarning: group references in lookbehind assertions are not supported
RuntimeWarning)
>>> assert pattern.match("ABCABCDEF")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
However, in Python 3.5 group references are permitted in lookbehind assertions, although only if they evaluate to a fixed-width pattern. You can also use conditional group references, such as (?(1)ABC|DEF)
1.
Python 3.5.10 (default, Mar 28 2021, 04:14:51)
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>>
>>> pattern = re.compile(r"(ABC|XYZ)...(?<=\1)DEF")
>>> assert pattern.match("ABCABCDEF")
>>>
>>> pattern = re.compile(r"((AAA|BBB):)?...(?<=(?(2)\2|XXX)):CCC")
>>> assert pattern.match("AAA:AAA:CCC")
>>> assert not pattern.match("BBB:AAA:CCC")
>>> assert pattern.match("BBB:BBB:CCC")
>>> assert not pattern.match("DDD:DDD:CCC")
>>> assert not pattern.match("DDD:CCC")
>>> assert pattern.match("XXX:CCC")
>>>
There’s also a change with regard to using matching groups in the replacement string of re.sub()
and re.subn()
. In Python 3.4, if a group failed to match in the source string and it was referenced in the replacement, then an exception would be raised:
Python 3.4.10 (default, Mar 28 2021, 04:12:21)
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import re
>>> re.sub("(AAA:)?BBB", r"<<\1>>", ":::AAA:BBB:::")
':::<<AAA:>>:::'
>>> re.sub("(AAA:)?BBB", r"<<\1>>", ":::BBB:::")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.4.10/lib/python3.4/re.py", line 179, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Users/andy/.pyenv/versions/3.4.10/lib/python3.4/re.py", line 331, in filter
return sre_parse.expand_template(template, match)
File "/Users/andy/.pyenv/versions/3.4.10/lib/python3.4/sre_parse.py", line 888, in expand_template
raise error("unmatched group")
sre_constants.error: unmatched group
However, in Python 3.5 instead of raising an exception, the group reference is simply replaced with an empty string:
Python 3.5.10 (default, Mar 28 2021, 04:14:51)
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.sub("(AAA:)?BBB", r"<<\1>>", ":::AAA:BBB:::")
':::<<AAA:>>:::'
>>> re.sub("(AAA:)?BBB", r"<<\1>>", ":::BBB:::")
':::<<>>:::'
In addition to these changes, the number of matching groups is no longer limited to 100 as it was previously. I might tentatively suggest, however, that if you have a case for a regular expression with more than 100 match groups, you should strongly consider using a more robust parsing mechnanism than regular expressions. Or at least breaking your expression up into a set of smaller regular expressions.
Finally, unrelated to matching groups, the re.error
instances have new attributes that provide more context on the error in question. This means that the errors you get pinpoint where in the pattern the error is occurring.
Quite at batch of changes to the networking modules this release. First we’ll get some smaller details out of the way, and then run through the modules with more significant updates.
The IMAP4
class has got the context manager treatment, so can now be used with the with
statement. At the end of the enclosed block, it will automatically send the LOGOUT
command.
The module also now supports UTF-8, implementing RFC 6855 and its pre-requisite RFC 5161. To activate this, the option must be enabled by passing UTF8=ACCEPT
to IMAP4.enable()
— the addition of this method is what supports RFC 5161. Whether this was successfully negotiated can be checked with IMAP4.utf8_enabled
. Also, non-ASCII usernames and passwords are now possible, being encoded with UTF-8.
Similary to imaplib
, poplib
now supports UTF-8 as per RFC 6856. This can be activated by calling POP3.utf8()
, which returns the server response if successful or raises an error_proto
exception if not.
The email
module has a few more options. The Policy.mangle_from_
option controls whether lines that start From
in email bodies have a >
charcter inserted. This was done before to avoid confusion with the header of the same name, but is now disabled by default in all policies except compat32
(for backwards compatibility).
The Message
and EmailMessage
classes now has a get_content_disposition()
method which makes it easier to determine the canonical value for this important header, primarily to determine whether this is an inline
or attachment
MIME part. It’s worth remembering that in a multipart message, the payload is a list of such instances, so each can have its own disposition value.
There are also changes to support UTF-8 headers and passing email.charset.Chartset
instances to the mime.text.MIMEText
constructor, but these are a little esoteric to go into any details on.
There’s a new HTTPStatus
enum which defines HTTP status codes. As well as the usual name
and value
attributes, each entry also defines a phrase
and description
, see the example below.
>>> http.HTTPStatus.NOT_FOUND.value
404
>>> http.HTTPStatus.NOT_FOUND.name
'NOT_FOUND'
>>> http.HTTPStatus.NOT_FOUND.phrase
'Not Found'
>>> http.HTTPStatus.NOT_FOUND.description
'Nothing matches the given URI'
>>> http.HTTPStatus.TOO_MANY_REQUESTS.description
'The user has sent too many requests in a given amount of time ("rate limiting")'
In http.client
there’s also a new RemoteDisconnected
exception which will be raised if the server closes the connection before at least the status line of the response can be returned. Previously the slightly less helpful BadStatusLine
was returned with an empty status line, which could cause some confusion.
This exception is a subclass of ConnectionResetError
which is itself derived from ConnectionError
. In a related change, any ConnectionError
now causes the underlying socket to be closed, to be reopened on the next request. This makes a great deal of sense since these errors typically mean either there’s no connection, or you’ve become somehow out of sync with the server, and the best option is to reconnect.
A couple of minor improvements. Firstly, the constructors for IPv4Network
and IPv6Network
now accept a 2-tuple of (address, netmask)
where the latter can be an integer number of bits or a mask in dotted-decimal notation.
Secondly, addresses now have a reverse_pointer
attribute which returns the name of the PTR record used for reverse DNS lookups.
>>> import ipaddress
>>> ipaddress.IPv4Network(("93.184.0.0", 16))
IPv4Network('93.184.0.0/16')
>>> ipaddress.IPv4Network(("93.184.0.0", "255.255.0.0"))
IPv4Network('93.184.0.0/16')
>>> ipaddress.IPv4Address("93.184.216.34").reverse_pointer
'34.216.184.93.in-addr.arpa'
For those running a mailserver written in Python, you might like to be aware that smtpd
has a few changes to have UTF-8. Those of you who prefer Google to read, sorry, handle your email instead can probably skip straight to the next section.
The SMTPServer
and SMTPChannel
constructors can now be passed a decode_data
keyword parameter. If True
, data is decoded as UTF-8 before being passed as a str
to the process_message()
function, which applications are expected to implement to handle incoming messages. If False
, however, messages are just passed as a raw bytes
object instead. The default is True
in Python 3.5 for backwards compatibility, but the intention is to change the default to False
in Python 3.6, as a framework like this shouldn’t necessarily assume UTF-8 encoding.
Additionally, if decode_data
is False
then the server advertises the 8BITMIME
SMTP extension as specified by RFC 6152. This permits 8-bit characters to be used, whereas the original SMTP specification specified characters must be 7-bit. If a client application specifies BODY=8BITMIME
on the MAIL
line then this will be passed to process_message()
using a new mail_options
keyword parameter, which is a list
of all the options the client specified.
There’s additionally support for RFC 6531, which extends SMTP to support UTF-8 in mailbox names and header fields. This is only advertised by the server if enable_SMTPUTF8=True
is passed to the SMTPServer
or SMTPChannel
constructor. If the client application wishes to utilise this, they add SMTPUTF8
to the MAIL
line and it’s again passed into process_message()
via mail_options
. It’s the responsibility of the implementation of this method to handle the UTF-8 values appropriately.
Finally, both the local bind address and the upstream SMTP relayer, which are passed as localaddr
and remoteaddr
respectively to the SMTPServer
constructor, can now be specifie as IPv6 addresses.
The smtplib
module supports several authentication methods2 and typically you’d call SMTP.login()
, passing in a username and password, to use whichever is the best one supported by the server. However, if you want to use an authorization method not supported by the library (e.g. DIGEST-MD5) then things are considerably trickier. Until now, that is, with the addition of the SMTP.auth()
method. Code implementing a different auth method passes in a callable object which is used to process the server’s challenge.
In addition, both SMTP.sendmail()
and SMTP.send_message()
now also support the UTF-8 RFC 6531, by passing SMTPUTF8
in the mail_options
parameter. This is required if you want to use non-ASCII characters in the email address fields.
There are a few changes to the socket
module that are worth noting. The first one is that functions with timeouts now use a monotonic clock instead of the system clock — this probably won’t impact too many people, but might save you from some maddeningly unreproducible bugs when the NTP daemon jumps in and fiddles with the system clock under your feet.
More generally applicable, socket
objects now offer a sendfile()
method which uses os.sendfile()
method that was exposed back in Python 3.3. This is 2-3x faster at transferring files across sockets as the copying is done entirely within the kernel.
Also, an annoying issue with the timeout of socket.sendall()
has been fixed. Previously, each time data was successfully sent the timeout clock was reset — the consequence was that there wasn’t any way for calling code to impose an overall timeout on the operation in the event that the connection was slow but made occasional progress. As of Python 3.5, however, the timeout is treated as an overall timeout for the entire operation, which is much more useful.
Finally, the backlog
parameter to socket.listen()
is now optional — if omitted it defaults to SOMAXCONN
which is the maximum permitted value, or 128
if that’s lower.
There are quite a few changes in the ssl
module, some of which is a little esoteric so I’ll try and be brief. I’m not holding out much hope, though.
First up is the new SSLObject
class, which has been added for cases where you need to access the SSL protocol stack but you don’t want all the network IO that’s built into SSLSocket
. The MemoryBIO
class acts as a memory buffer to pass data in and out for this case. This would be useful for integrating SSL into an application’s wider poll loop, for example.
Next we have support for RFC 7301 application-layer protocol negotiation. This extension to TLS allows clients to negotiate which of several protocols to exchange over a single underlying secure channel at connection time, and is a key requirement for being able to run HTTP/2 over existing HTTP/1.1 SSL ports. The SSLContext
class now has a set_alpn_protocols()
method which is used for the application code to advertise during the TLS handshake — for example, if you want to use either HTTP/2 or HTTP/1.1 you could pass ["h2", "http/1.1"]
. The SSLSocket
has a corresponding selected_alpn_protocol()
method which returns the protocol which was selected for the connection.
And then there’s a collection of smaller enhancements:
SSLSocket.version()
AddedSSLSocket.sendfile()
Addedsocket.socket
objects.SSLSocket.send()
ChangeSSLSocket.send()
was called on a non-blocking socket when it would normally block, it would previously return 0
. Now it raises the ssl.SSLWantReadError
or ssl.SSLWantWriteError
exceptions as appropriate.cert_time_to_seconds()
Now Takes UTCsocket.sendall()
, various SSLSocket
methods no longer reset the timeout on a successful write: do_handshake()
, read()
, shutdown()
, and write()
.HTTP Basic Authentication is still used on the web for simple cases, as it’s easy to code and provides basic protection that may be suitable for low-risk cases over otherwise secure channels (e.g. TLS). This style of authentication demands that requests for protected resources that lack an appropriate Authorization
header should trigger a 401 Unauthorized
response. Some HTTP client libraries have come to depend on this behaviour, and won’t actually send an Authorization
header until they see the 401
so they know that authorization is required. It turns out that Python’s urllib
is one of those libraries.
The problem is that some servers don’t conform to this behaviour, often intentionally — a notable example is Github’s API which responds with a standard 404 Not Found
error instead of a 401
, for rather vague hand-wavey reasons of security3. The solution to this is to pre-emptively send the Authorization
header on the initial request instead of waiting for the 401
— this is explcitly anticipated in §2 of RFC 2617 where it says:
A client MAY preemptively send the corresponding Authorization header with requests for resources in that space without receipt of another challenge from the server.
To facilitate these there’s a new urllib.request.HTTPPasswordMgrWithPriorAuth
class which is similar to HTTPPasswordMgrWithDefaultRealm
but which pre-emptively sends the Authorization
header. Even in cases where the server responds with an appropriate 401
, this approach also saves an unncecessary round-trip time.
As well as this new class, there are a handful of smaller changes:
parse.urlencode()
Supports quote_via
Argumenturlencode()
method needs to URL-encode values, escaping special characters which aren’t valid in URLs. It now accepts a quote_via
parameter, where you can specify a function to transform a string to a URL-safe form. The default is to use urllib.parse.quote_plus()
, which encodes spaces as +
.request.urlopen()
Supports context
Argumentssl.SSLContext
object to use for HTTPS connections.parse.urljoin()
Updated..
enough times to move outside the root of the URL space. As suggested by §5.4 of RFC 3986, these invalid excess sections should just be ignored, and that’s why urljoin()
now does in Python 3.5.There are some small changes to modules that wrap up common coding tasks.
contextlib.redirect_stderr()
Addedstderr
. It takes an alternative file descriptor as a parameter, which could be an io.StringIO
instance, for example.functools.lru_cache
Implemented in Cimportlib.util.LazyLoader
has been added to allow the actual load of a module to be deferred until the first attribute access. I’d say this is generally a bad idea unless you really need it, however, as any exceptions or error messages which would have occurred at import are then deferred until first use and occur in a confusing context.module_from_spec()
Addedimportlib.util.module_from_spec()
which is now the preferred way to create a new module. The advantage over directly instantiating types.ModuleType
is that it additionally sets some import-controlled attributes on the new module object based on the ModuleSpec
that you pass in.A few assorted tidbits for operating system features, as well as some slightly more substantial updates to pathlib
and subprocess
.
Firstly, the glob
module now Supports **
in the glob()
and iglob()
functions. This acts rather like *
except that it also matches directory separators — in other words, it recurses into subdirectories to find matches.
In the os
module, urandom()
now uses getrandom()
on Linux and getentropy()
on OpenBSD, to avoid the need to open /dev/urandom
, which can fail if your process is at its filehandle limit.
There are handy new get_blocking()
and set_blocking()
methods on file descriptors, to avoid having to fiddle around with O_NONBLOCK
directly.
And for anyone, like me, who’s always been slightly annoyed at how useful os.path.commonprefix()
isn’t, there’s a saviour in the form of os.path.commonpath()
. The issue is that commonprefix()
always returns the longest common prefix string, but even if that breaks the name in the middle of a directory or filename. The new commonpath()
does the same thing, but always breaks the path on a directory separator, so the result will be a valid path.
>>> os.path.commonprefix(("/home/andy/myfirstfile", "/home/andy/mysecondfile"))
'/home/andy/my'
>>> os.path.commonpath(("/home/andy/myfirstfile", "/home/andy/mysecondfile"))
'/home/andy'
The shutil.move()
function now takes a new parameter copy_function
to specify the function to use for copying if moving items between filesystems (within the same filesystem, os.rename()
is still used instead). It defaults to shutil.copy2()
, which copies all file content and metadata, but something like shutil.copy()
might be more appropriate if you just want the content itself copied.
In the signal
module, the various SIG*
constants (e.g. signal.SIGTERM
) have been replaced by values in the signal.Signals
enumeration. This allows for more convenience when logging values, etc.
The pathlib
module, which you may remember was added in Python 3.4, has some handy changes. Firstly, Path
instances now have a samefile()
method, which indicates whether this path refers to the same physical file as another path, which can be another Path
object or just a string. This is particularly useful as it actually goes to the concrete filesystem, so it supports things like symbolic links to the same file.
>>> import os
>>> import pathlib
>>> import shutil
>>> os.makedirs("/tmp/test/one")
>>> os.makedirs("/tmp/test/two")
>>> os.makedirs("/tmp/test/three")
>>> with open("/tmp/test/one/testfile_one", "w") as fd:
... fd.write("hello, world\n")
...
13
>>> shutil.copy("/tmp/test/one/testfile_one", "/tmp/test/two/testfile_two")
'/tmp/test/two/testfile_two'
>>> os.symlink("/tmp/test/one/testfile_one", "/tmp/test/three/testfile_three")
>>>
>>> mypath = pathlib.Path("/tmp/test/one/testfile_one")
>>> # True, because it's just an obfuscated version of the same path.
>>> mypath.samefile("/tmp/test/one/../../test/one/testfile_one")
True
>>> # False, because this was a copy of the file, not the same file.
>>> mypath.samefile("/tmp/test/two/testfile_two")
False
>>> # True, because this is a symlink to the same file.
>>> mypath.samefile("/tmp/test/three/testfile_three")
True
There are a few other improvements to Path
as well:
Path.mkdir()
Now Takes exist_ok
exist_ok=True
will suppress the FileExistsError
if the target already exists, similar to mkdir -p
on the command-line.Path.expanduser()
AddedPath
instance with ~
and ~username
tokens expanded.Path.home()
AddedPath
of the user’s home directory.Path
, which are read_text()
, write_text()
, read_bytes()
and write_bytes()
. These simplify read/write operations on files represented by Path
objects.Previously, the approach to executing a child process which typically to construct a Popen
instance, set attributes on it as required and then either call the communicate()
method or manually deal with IO until you eventually detect the child has terminated with wait()
or poll()
and reap the return code.
The Popen
class is extremely flexible, and enables many use-cases for subprocesses, both synchronous and asynchronous. However, there are often times where you don’t need this flexibility — you just want to execute something, wait for it to finish and then check its exit status. For these cases there’s now a convenience function subprocess.run()
.
This accepts most of the arguments that Popen.__init__()
takes, so you still have a good deal of flexibility, but it does impose a synchronous model on calling code — it waits for the subprocess to terminate, and returns a CompletedProcess
instance directly from the function to save you a separate call to recover the exit status. If you need more flexibility, there’s still the option of using the underlying Popen
object directly instead, but run()
is now the recommended approach if suitable.
Aside from the myriad parameters passed directly into the Popen
constructor, it’s got some handy convenience methods. For the common case of capturing all output, just specify capture_output=True
and recover the output from the stdout
and stderr
attributes of the CompletedProcess
instance you get back. Another common case is to want to only deal with output if the command failed, and for this you can pass check=True
which will convert a non-zero exit status into a CalledProcessError
exception, attributes of which conveniently contain the output, if it was capture. You can specify timeout
which is passed to Popen.communicate()
, but if the process does time out then run()
arranges for it to be killed and reaped before re-raising the TimeoutExpired
exception, which is handy.
Overall this promises to streamline a lot of of common cases for spawning external commands.
Whatever didn’t fit elsewhere, but I thought was also worth a brief mention.
heapq.merge()
Improvementskey
parameteter to heapq.merge()
to customise the values used to compare elements. There’s also an optional reverse
parameter to invert the result of the comparison. These mirror the same options to the sorted()
builtin function.locale.delocalize()
Addedmath
Constantsmath.inf
and math.nan
, to save you making typos in float("inf")
and float("nan")
.math.gcd
Addedfractions.gcd()
function has been deprecated and now there’s a new math.gcd()
function instead, to return the greatest common divisor of two numbers.sqlite3.Row
Class A Full Sequencereversed()
iteration and slice indexing work on these objects.So that’s Python 3.5, such a big release it’s taken four articles just to cover it all. The coroutine changes and type hinting are both big features, and among all the rest of the changes are some real gems such as the addition of os.scandir()
, automatic handling of EINTR
and the addition of subprocess.run()
. Frankly it’s been quite a mission going through it all, and I’m still continually amazed how many changes have been squeezed into every release of this already mature language.
Anyway, I’m already looking forward to seeing what’s in 3.6, but also hopeful it perhaps won’t be quite such a big release as I’m still holding hopes of catching all the way up to 3.9 before 3.10 turns up in October. Thanks for reading!
For context, a conditional reference like (?(1)ABC|DEF)
checks if match group 1 (in this example) matched successfully. If so, the conditional block matches ABC
, and if not the conditional block matches DEF
. ↩
My assumption is that they’re saying you can tell whether a resource exists by making a request and seeing whether you get a 401
instead of a 404
, but I’ve yet to see any explanation of why they can’t simply respond with a 401
to all requests that lack an Authorization
header, and only returning a 404
in cases where the user is correctly authenticated for a resource in that part of the URL space. This would have the distinction of not breaking the RFC. In my opinion, you should structure a REST API so that authentication is something that can happen based on a prefix of it and you complete that process before you even check if that URL maps to a real resource or not. But it’s just my view. ↩