☑ Python 2to3: What’s New in 3.4 - Part 1

4 Apr 2021 at 3:25PM in Software
 |   | 

In this series looking at features introduced by every version of Python 3, this one is the first of two covering release 3.4. We look at a universal install of the pip utility, improvements to handling codecs, and the addition of the asyncio and enum modules, among other things.

green python two

Python 3.4 was released on March 16 2014, around 18 months after Python 3.3. That means I’m only writing this around seven years late, as opposed to my Python 3.0 overview which was twelve years behind — at this rate I should be caught up in time for the summer.

This release was mostly focused on standard library improvements and there weren’t any syntax changes. There’s a lot here to like, however, including a bevy of new modules and a whole herd of enhancements to existing ones, so let’s fire up our Python 3.4 interpreters and import some info.

What a Pip

For anyone who’s unaware of pip, is the most widely used package management tool for Python, its name being a recursive acronym for pip installs packages. Originally written by Ian Bicking, creator of virtualenv, it was originally called pyinstall and was written to be a more fully-featured alternative to easy_install, which was the official package installation tool at the time.

Since pip is the tool you naturally turn to for installation Python modules and tools, this always begs the question: how do you install pip for the first time? Typically the answer has been to install some OS package with it in, and once you have it installed you can use it to install everything else. In the new release, however, there’s a new ensurepip module to perform this bootstrapping operation. It uses a private copy of pip that’s distributed with CPython, so it doesn’t require network access and can readily be used by anyone on any platform.

This approach is part of a wider standardisation effort around distributing Python packages, and pip was selected as a tool that’s already popular and also works well within virtual environments. Speaking of which, this release also updates the venv module to install pip in virtual environments by default, using ensurepip. This was something that virtualenv always did, and the lack of it in venv was a serious barrier to adoption of venv for a number of people. Additionally the CPython installers on Windows and MacOS also default to installing pip on these platforms. You can find full details in PEP 453.

When you try newer langauges like Go and Rust, coming from a heritage of C++ and the like, one of the biggest factors that leaps out at you isn’t so much the language itself but the convenience of the well integrated standard tooling. With this release I think Python has taken another step in this direction, with a standard and consistent package management on all the major platforms.

File Descriptor Inheritance (or Lack Thereof)

Under POSIX, file descriptors are by default inherited by child processes during a fork() operation. This offers some concrete advantages, such as the child process automatically inheriting the stdin, stdout and stderr from the parent, and also allowing the parent to create a pipe with pipe() to communicate with the child process1.

However, this behaviour can cause confusion and bugs. For example, if the child process is a long-running daemon then this file descriptor may be held open indefinitely and the disk space associated with the file will not be freed. Or if the parent had a large number of open file descriptors, the child may exhaust the remaining space if it too tries to open a large number. This is one reason why it’s common to iterate over all file descriptors and call close() on them after forking.

In Python 3.4, however, this behaviour has been modified so that file descriptors are not inherited. This is implemented by setting FD_CLOEXEC on the descriptor via fcntl()2 on POSIX systems, which closes all current file descriptors when any of the execX() family are called. On Windows, SetHandleInformation() is used passing HANDLE_FLAG_INHERIT with much the same purpose.

Since inheritance of file descriptors is still desirable in some circumstances, the functions os.get_inheritable() and os.set_inheritable() can be used to query and set this behaviour on a per-filehandle basis. There are also os.get_handle_inheritable() and os.set_handle_inheritable() calls on Windows, if you’re using native Windows handles rather than the POSIX layer.

One important aspect to note here is that when using the FD_CLOEXEC flag, the close() happens on the execX() call, so if you call a plain vanilla os.fork() and continue execution in the same script then all the descriptors will still be open. To demonstrate the action of these methods, you’ll need to do something like this (which is Unix-specific since it assumes the existence of /tmp):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import os
import sys
import tempfile
import time

# Create script that we'll exec to test whether FD is still open.
fd, script_path = tempfile.mkstemp()
os.write(fd, b"import os\n")
os.write(fd, b"import sys\n")
os.write(fd, b"fd = int(sys.argv[1])\n")
os.write(fd, b"msg = sys.argv[2]\n")
os.write(fd, b"data = (msg + '\\n').encode('utf-8')\n")
os.write(fd, b"try:\n")
os.write(fd, b"    os.write(fd, data)\n")
os.write(fd, b"except Exception as exc:\n")
os.write(fd, b"    print('ERROR: ' + str(exc))\n")
os.close(fd)

# Create output file to which child processes will attempt to write.
fd, output_path = tempfile.mkstemp()
os.write(fd, b"Before fork\n")

# First attempt, should fail with "Bad file descriptor" as the default
# is for filehandles to not inherit over exec.
if os.fork() == 0:
    os.execl(
        sys.executable, script_path, script_path, str(fd), "FIRST"
    )
os.wait()

# Second attempt should succeed once fd is inheritable.
os.set_inheritable(fd, True)
if os.fork() == 0:
    os.execl(
        sys.executable, script_path, script_path, str(fd), "SECOND"
    )
os.wait()

# Now we re-read the file to see which attempts worked.
os.lseek(fd, os.SEEK_SET, 0)
print("Contents of file:")
print(os.read(fd, 4096).decode("utf-8"))
os.close(fd)

# Clean up temporary files.
os.remove(script_path)
os.remove(output_path)

When run, you should see something like the following:

ERROR: [Errno 9] Bad file descriptor
Contents of file:
Before fork
SECOND

That first line is the output from the first attempt to write the file, which fails. The contente of the output file clearly indicates the second write was successful.

In general I think this change is a very sensible one as the previous default behaviour of inheriting file descriptors by default on POSIX systems probably took a lot of less experienced developers (and a few more experienced ones!) by surprise. It’s the sort of nasty surprise that you don’t realise is there until those odd cases where, say, you’re dealing with hundreds of open files at once and when you spawn a child process it suddenly starts complaining it’s hit the system limit on open file descriptors and you wonder what on earth is going on. It always seems that such odd cases are always those when you have the tightest deadlines, too, so the last thing you need is to spend hours tracking down some weird file descriptor inheritance bug.

If you need to know more, PEP 446 has the lowdown, including references to real issues in various OSS projects caused by this behaviour.

Clarity on Codecs

The codecs module has long been a fixture in Python, since it was introduced in (I think!) Python 2.0, released over two decades ago. It was intended as general framework for registering and using any sort of codec, and this can be seen from the diverse range of codecs it supports. For example, as well as obvious candidates like utf-8 and ascii, you’ve got options like base64, hex, zlib and bz2. You can even register your own with codecs.register().

However, most people don’t use codecs on a frequent basis, but they do use the convenience methods str.encode() and bytes.decode() all the time. This can cause confusion because while the encode() and decode() methods provided by codecs are generic, the convenience methods on str and bytes are not — these only support the limited set of text encodings that make sense for those classes.

In Python 3.4 this situation has been somewhat improved by more helpful error messages and improved documentation.

Firstly, the methods codecs.encode() and codecs.decode() are now documented, which they weren’t previously. This is probably because they’re really they are just convenient wrappers for calling lookup() and invoking the encoder object thus created, but unless you’re doing a lot of encoding/decoding with the same codec, the simplicity of their interface is probably preferable. Since these are C extension modules under the hood, there shouldn’t be a lot of performance overhead for using these wrappers either.

>>> import codecs
>>> encoder = codecs.lookup("rot13")
>>> encoder.encode("123hello123")
('123uryyb123', 11)

Secondly, using one of the non-text encodings without going through the codecs module now yields a helpful error which points you in that direction.

>>> "123hello123".encode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: 'rot13' is not a text encoding; use codecs.encode() to handle arbitrary codecs

Finally, errors during encoding now use chained exceptions to ensure that the codec responsible for them is indicated as well as the underlying error raised by that codec.

>>> codecs.decode("abcdefgh", "hex")
Traceback (most recent call last):
  File "/Users/andy/.pyenv/versions/3.4.10/encodings/hex_codec.py", line 19, in hex_decode
    return (binascii.a2b_hex(input), len(input))
binascii.Error: Non-hexadecimal digit found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: decoding with 'hex' codec failed (Error: Non-hexadecimal digit found)

Hopefully all this will go some way to making things easier to grasp for anyone grappling with the nuances of codecs in Python.

New Modules

This release has a number of new modules, which are discussed in the sections below. I’ve skipped ensurepip since it’s already been discussed at the top of this article.

Asyncio

This release contains the new asyncio module which provides an event loop framework for Python. I’m not going to discuss it much in this article because I already covered it a few years ago in an article that was part of my coroutines series. The other reason not to go into things in too much detail here are that the situation evolved fairly rapidly from Python 3.4 to 3.7, so it probably makes more sense to have a more complete look in retrospect.

Briefly, it’s nominally the successor to the asyncore module, for doing asynchronous I/O, which was always promising in priciple but a bit of a disappointment in practice due to a lack of flexibility. This is far from the whole story, however, as it also forms the basis for the modern use of coroutines in Python.

Since I’m writing these articles with the benefit of hindsight, my strong suggestion is to either go find some other good tutorials on asyncio that were written in the last couple of years, and which use Python 3.7 as a basis; or wait until I get around to covering Python 3.7 myself, where I’ll run through in more detail (especially since my previous articles stopped at Python 3.5).

Enum

Enumerations are something that Python’s been lacking for some time. This is partly due to the fact that it’s not too hard to find ways to work around this omission, but they’re often a little unsatisfactory. It’s also partly due to the fact that nobody could fully agree on the best way to implement them.

Well in Python 3.4 PEP 435 has come along to change all that, and it’s a handy little addition.

Enumerations are defined using the same syntax as a class:

class WeekDay(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

However, it’s important to note that this isn’t actually a class, as it’s linked to the enum.EnumMeta metaclass. Don’t worry too much about the details, just be aware that this is not a class but essentially a new construct that uses the same syntax as classes, and you won’t be taken by surprise later.

You’ll notice that all the enumeration members need to be assigned a value, you can’t just list the member names on their own (although read on for a nuance to this). When you have an enumeration value you can query both its name and value, and also str and repr have sensible values. See the excerpt below for an illustration of all these aspects.

>>> WeekDay.WEDNESDAY.name
'WEDNESDAY'
>>> WeekDay.WEDNESDAY.value
3
>>> str(WeekDay.FRIDAY)
'WeekDay.FRIDAY'
>>> repr(WeekDay.FRIDAY)
'<WeekDay.FRIDAY: 5>'
>>> type(WeekDay.FRIDAY)
<enum 'WeekDay'>
>>> type(WeekDay)
<class 'enum.EnumMeta'>
>>> WeekDay.THURSDAY - WeekDay.MONDAY
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'WeekDay' and 'WeekDay'
>>> WeekDay.THURSDAY.value - WeekDay.MONDAY.value
3

I did mention that every enumeration members need a name, but there is an enum.auto() helper for you to automatically assign values if all you need is something unique. The excerpt below illustrates this as well as iterating through an enumeration.

>>> from enum import Enum, auto
>>> class Colour(Enum):
...     RED = auto()
...     GREEN = auto()
...     BLUE = auto()
...
>>> print("\n".join(i.name + "=" + str(i.value) for i in Colour))
RED=1
GREEN=2
BLUE=3

Every enumeration name must be unique within a given enumeration definition, but the values can be duplicated if needed, which you can use to define aliases for values. If this isn’t desirable, the @enum.unique decorator can enforce uniqueness, raising a ValueError if not.

One thing that’s not immediately obvious from these examples is that enumeration member values may be any type and different types may even be mixed within the same enumeration. I’m not sure how valuable this would be to do in practive, however.

Values can be compared by identity or equality, but comparing enumeration members to their underlying types always returns not equal. Even when comparing by identity, aliases for the same underlying value compare equal. Also note that when iterating through enumerations, aliases are skipped and the first definition for each value is used.

>>> class Numbers(Enum):
...     ONE = 1
...     UN = 1
...     EIN = 1
...     TWO = 2
...     DEUX = 2
...     ZWEI = 2
...
>>> Numbers.ONE is Numbers.UN
True
>>> Numbers.TWO == Numbers.ZWEI
True
>>> Numbers.ONE == Numbers.TWO
False
>>> Numbers.ONE is Numbers.TWO
False
>>> Numbers.ONE == 1
False
>>> list(Numbers)
[<Numbers.ONE: 1>, <Numbers.TWO: 2>]

If you really do need to include aliases in your iteration, the special __members__ dictionary can be used for that.

>>> import pprint
>>> pprint.pprint(Numbers.__members__)
mappingproxy({'DEUX': <Numbers.TWO: 2>,
              'EIN': <Numbers.ONE: 1>,
              'ONE': <Numbers.ONE: 1>,
              'TWO': <Numbers.TWO: 2>,
              'UN': <Numbers.ONE: 1>,p
              'ZWEI': <Numbers.TWO: 2>})

Finally, the module also provides some subclasses of Enum which may be useful. For example, IntEnum is one which adds the ability to compare enumeration values with int as well as other enumeration values.

This is a bit of a whirlwind tour of what’s been written to be quite a flexible module, but hopefully if gives you an idea of its capabilities. Check out the full documentation for more details.

Pathlib

This release sees the addition of a new library pathlib to manipulate filesystem paths, with semantics appropriate for different operating systems. This is intended to be a higher-level abstraction than that provided by the existing os.path library, which itself has some functions to abstract away from the filesystem details (e.g. os.path.join() which uses appropriate slashes to build a path).

There are common base classes across platforms, and then different subclasses for POSIX and Windows. The classes are also split into pure and concrete, where pure classes represent theoretical paths but lack any methods to interact with the concrete filesystem. The concrete equivalents have such methods, but can only be instantiated on the appropriate platform.

For reference, here is the class hierarchy:

pathlib class structure

When run on a POSIX system, the following excerpt illustrates which of the platform-specific classes can be instantiated, and also that the pure classes lack the filesystem methods that the concrete ones provide:

>>> import pathlib
>>> a = pathlib.PurePosixPath("/tmp")
>>> b = pathlib.PureWindowsPath("/tmp")
>>> c = pathlib.PosixPath("/tmp")
>>> d = pathlib.WindowsPath("/tmp")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/andy/.pyenv/versions/3.4.10/pathlib.py", line 927, in __new__
    % (cls.__name__,))
NotImplementedError: cannot instantiate 'WindowsPath' on your system
>>> c.exists()
True
>>> len(list(c.iterdir()))
24
>>> a.exists()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'exists'
>>> len(list(a.iterdir()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'iterdir'

Of course, a lot of the time you’ll just want whatever path represents the platform on which you’re running, so if you instantiate plain old Path you’ll get the appropriate concrete representation.

>>> x = pathlib.Path("/tmp")
>>> type(x)
<class 'pathlib.PosixPath'>

One handy feature is that the division operator (slash) has been overridden so that you can append path elements with it. Note that this operator is the same on all platforms, and also you always use forward-slashes even on Windows. However, when you stringify the path, Windows paths will be given backslashes. The excerpt below illustrates these features, and also some of the manipulations that pure paths support.

>>> x = pathlib.PureWindowsPath("C:/") / "Users" / "andy"
>>> x
PureWindowsPath('C:/Users/andy')
>>> str(x)
'C:\\Users\\andy'
>>> x.parent
PureWindowsPath('C:/Users')
>>> [str(i) for i in x.parents]
['C:\\Users', 'C:\\']
>>> x.drive
'C:'

So far it’s pretty handy but perhaps nothing to write home about. However, there are some handy features. One is glob matching, where you can test a given path for matches against a glob-style pattern with the match() method.

>>> x = pathlib.PurePath("a/b/c/d/e.py")
>>> x.match("*.py")
True
>>> x.match("d/*.py")
True
>>> x.match("a/*.py")
False
>>> x.match("a/*/*.py")
False
>>> x.match("a/*/*/*/*.py")
True
>>> x.match("d/?.py")
True
>>> x.match("d/??.py")
False

Then there’s relative_to() which is handy for getting the relative path of a file to some specified parent directory. It also raises an exception if the path isn’t under the parent directory, which makes checking for errors in paths specified by the user more convenient.

>>> x = pathlib.PurePath("/one/two/three/four/five.py")
>>> x.relative_to("/one/two/three")
PurePosixPath('four/five.py')
>>> x.relative_to("/xxx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pathlib.py", line 819, in relative_to
    .format(str(self), str(formatted)))
ValueError: '/one/two/three/four/five.py' does not start with '/xxx'

And finally there’s with_name(), with_stem() and with_suffix() which are useful for making manipulations of parts of the filename.

>>> x = pathlib.PurePath("/home/andy/file.md")
>>> x.with_name("newfilename.html")
PurePosixPath('/home/andy/newfilename.html')
>>> x.with_stem("newfile")
PurePosixPath('/home/andy/newfile.md')
>>> x.with_suffix(".html")
PurePosixPath('/home/andy/file.html')
>>> x.with_suffix("")
PurePosixPath('/home/andy/file')

The concrete classes add a lot more useful functionality for querying the content of directories and reading file ownership and metadata, but if you want more details I suggest you go read the excellent documentation. If you want the motivations behind some of the design decisions, go and read PEP 428.

Statistics

Both simple and useful, this new module contains some handy functions to calculate basic statistical measures from sets of data. All of these operations support the standard numeric types int, float, Decimal and Fraction and raise StatisticsError on errors, such as an empty data set being passed.

The following functions for determining different forms of average value are provided in this release:

mean()
Broadly equivalent to sum(data) / len(data) except supporting generalised iterators that can only be evaluated once and don’t support len().
median()
Broadly equivalent to data[len(data) // 2] except supporting generalised iterators. Also, if the number of items in data is even then the mean of the two middle items is returned instead of selecting one of them, so the value is not necessarily one of the actual members of the data set in this case.
median_low() and median_high()
These are identical to median() and each other for data sets with an odd number of elements. If the number of elements is even, these return one of the two middle elements instead of their mean as median() does, with median_low() returning the lower of the two and median_high() the higher.
median_grouped()
This function implements the median of continuous data based on the frequncy of values in fixed-width groups. Each value is interpreted as the midpoint of an interval, and the width of that interval is passed as the second argument. If omitted, the interval defaults to 1, which would represent continuous values that have been rounded to the nearest integer. The method involes identifying the median interval, and then using the proportion of values above and within that interval to interpolate an estimate of the median value within it3.
mode()
Returns the most commonly occurring value within the data set, or raises StatisticsError if there’s more than one value with equal-highest cardinality.

There are also functions to calculate the variance and standard deviation of the data:

pstdev() and stdev()
These calculate the population and sample standard deviation respectively.
pvariance() and variance()
These calculate the population and sample variance respectively.

These operations are generally fairly simple to implement yourself, but making them operate correctly on any iterator is slightly fiddly and it’s definitely handy to have them available in the standard library. I also have a funny feeling that we’ll be seeing more additions to this library in the future beyond the fairly basic set that’s been included initially.

Tracemalloc

As you can probably tell from the name, this module is intended to help you track down where memory is being allocated in your scripts. It does this by storing the line of code that allocated every block, and offering APIs which allow your code to query which files or lines of code have allocated the most blocks, and also compare snapshots between two points in time so you can track down the source of memory leaks.

Due to the memory and CPU overhead of performing this tracing it’s not enabled by default. You can start tracking at runtime with tracemalloc.start(), or to start it early you can pass the PYTHONTRACEMALLOC environment variable or -X tracemalloc command-line option. You can also store multiple frames of traceback against each block, at the cost of increased CPU and memory overhead, which can be helpful for tracing the source of memory allocations made by common shared code.

Once tracing is enabled you can grab a snapshot at any point with take_snapshot(), which returns a Snapshot instance which can be interrogated for information at any later point. Once you have a Snapshot instance you can call statistics() on it to get the memory allocations aggregated by source file, or broken down by line number of specific backtrace. There’s also a compare_to() method for examining the delta in memory allocations between two points, and there are dump() and load() methods for saving snapshots to disk for later analysis, which could be useful for tracing code in production environments.

As a quick example of these two methods, consider the following completely artificial code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
memory.py
import tracemalloc

import lib1
import lib2
import lib3

tracemalloc.start()

mystring1 = "abc" * 4096
mystring2 = "\U0002000B\U00020016\U00020017" * 4096
foo = lib1.Foo()
bar = lib2.Bar()
baz = lib3.Baz()

snapshot1 = tracemalloc.take_snapshot()
print("---- Initial snapshot:")
for entry in snapshot1.statistics("lineno"):
    print(entry)

del foo, bar, baz
snapshot2 = tracemalloc.take_snapshot()
print("\n---- Incremental snapshot:")
for entry in snapshot2.compare_to(snapshot1, "lineno"):
    print(entry)
1
2
3
4
5
lib1.py
import os

class Foo:
    def __init__(self):
        self.entropy_pool = [os.urandom(64) for i in range(100)]
1
2
3
4
lib2.py
class Bar:
    def __init__(self):
        self.name = "instance"
        self.id = 12345
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
lib3.py
import random
import string

class Baz:
    def __init__(self):
        self.values = []
        for i in range(100):
            buffer = ""
            for j in range(1024):
                buffer += random.choice(string.ascii_lowercase)
            self.values.append(buffer)

Let’s take a quick look at the two parts of the output that executing memory.py gives us. The first half that I get on my MacOS system is shown below — wherever you see “...” it’s where I’ve stripped out leading paths to avoid the need for word wrapping:

---- Initial snapshot:
.../lib3.py:10: size=105 KiB, count=101, average=1068 B
memory.py:10: size=48.1 KiB, count=1, average=48.1 KiB
memory.py:9: size=12.0 KiB, count=1, average=12.0 KiB
.../lib1.py:5: size=10.3 KiB, count=102, average=104 B
.../lib3.py:11: size=848 B, count=1, average=848 B
memory.py:13: size=536 B, count=2, average=268 B
.../python3.4/random.py:253: size=536 B, count=1, average=536 B
memory.py:12: size=56 B, count=1, average=56 B
memory.py:11: size=56 B, count=1, average=56 B
.../lib3.py:6: size=32 B, count=1, average=32 B
.../lib2.py:3: size=32 B, count=1, average=32 B

I’m not going to go through all of these, but let’s pick a few examples to check what we’re seeing makes sense. Note that the results from statistics() are always sorted in decreasing order of total memory consumption.

The first line indicates lib3.py:10 allocated memory 101 times, which is reassuring because it’s not allocating every time around the nested loop. Interesting to note that it’s one more time than the number of times around the outer loop, however, which perhaps implies there’s some allocation that was done the first time and then reused. The average allocation of 1068 bytes makes sense, since these are str objects of 1024 characters and based on sys.getsizeof("") on my platform each instance has an overhead of around 50 bytes.

Next up are memory.py:10 and memory.py:9 which are straightforward enough: single allocations for single strings. The sizes are such that the str overhead is lost in rounding errors, but do note that the string using extended Unicode characters4 requires 4 bytes per character and is therefore four times larger than the byte-per-character ASCII one. If you’ve read the earlier articles in this series, you may recall that this behaviour was introduced in Python 3.3.

Skipping forward slightly, the allocation on lib3.py:11 is interesting: when we append the str we’ve built to the list we get a single allocation of 848 bytes. I assume there’s some optimisation going on here, because if I increase the loop count the allocation count remains at one but the size increases.

The last thing I’ll call out is the two allocations on memory.py:13. I’m not quite sure exactly what’s triggering this, but it’s some sort of optimisation — even if the loop has zero iterations then these allocations still occur, but if I comment out the loop entirely then these allocations disappear. Fascinating stuff!

Now we’ll look at the second half the output, comparing the initial snapshot to that after the class instances are deleted:

---- Incremental snapshot:
.../lib3.py:10: size=520 B (-105 KiB), count=1 (-100), average=520 B
.../lib1.py:5: size=0 B (-10.3 KiB), count=0 (-102)
.../python3.4/tracemalloc.py:462: size=1320 B (+1320 B), count=3 (+3), average=440 B
.../python3.4/tracemalloc.py:207: size=952 B (+952 B), count=3 (+3), average=317 B
.../python3.4/tracemalloc.py:165: size=920 B (+920 B), count=3 (+3), average=307 B
.../lib3.py:11: size=0 B (-848 B), count=0 (-1)
.../python3.4/tracemalloc.py:460: size=672 B (+672 B), count=1 (+1), average=672 B
.../python3.4/tracemalloc.py:432: size=520 B (+520 B), count=2 (+2), average=260 B
memory.py:18: size=472 B (+472 B), count=1 (+1), average=472 B
.../python3.4/tracemalloc.py:53: size=472 B (+472 B), count=1 (+1), average=472 B
.../python3.4/tracemalloc.py:192: size=440 B (+440 B), count=1 (+1), average=440 B
.../python3.4/tracemalloc.py:54: size=440 B (+440 B), count=1 (+1), average=440 B
.../python3.4/tracemalloc.py:65: size=432 B (+432 B), count=6 (+6), average=72 B
.../python3.4/tracemalloc.py:428: size=432 B (+432 B), count=1 (+1), average=432 B
.../python3.4/tracemalloc.py:349: size=208 B (+208 B), count=4 (+4), average=52 B
.../python3.4/tracemalloc.py:487: size=120 B (+120 B), count=2 (+2), average=60 B
memory.py:16: size=90 B (+90 B), count=2 (+2), average=45 B
.../python3.4/tracemalloc.py:461: size=64 B (+64 B), count=1 (+1), average=64 B
memory.py:13: size=480 B (-56 B), count=1 (-1), average=480 B
.../python3.4/tracemalloc.py:275: size=56 B (+56 B), count=1 (+1), average=56 B
.../python3.4/tracemalloc.py:189: size=56 B (+56 B), count=1 (+1), average=56 B
memory.py:12: size=0 B (-56 B), count=0 (-1)
memory.py:11: size=0 B (-56 B), count=0 (-1)
.../python3.4/tracemalloc.py:425: size=48 B (+48 B), count=1 (+1), average=48 B
.../python3.4/tracemalloc.py:277: size=32 B (+32 B), count=1 (+1), average=32 B
.../lib3.py:6: size=0 B (-32 B), count=0 (-1)
.../lib2.py:3: size=0 B (-32 B), count=0 (-1)
memory.py:10: size=48.1 KiB (+0 B), count=1 (+0), average=48.1 KiB
memory.py:9: size=12.0 KiB (+0 B), count=1 (+0), average=12.0 KiB
.../python3.4/random.py:253: size=536 B (+0 B), count=1 (+0), average=536 B

Firstly, there are of course a number of allocations within tracemalloc.py, which are the result of creating and analysing the previous snapshot. We’ll disregard these, because they depend on the details of the library implementation which we don’t have transparency into here.

Beyond this, most of the changes are as you’d expect. Interesting points to note are that one of the allocations lib3.py:10 was not freed, and only one of the two allocations from memory.py:13 was freed. Since these were the two cases where I was a little puzzled by the apparently spurious additional allocations, I’m not particularly surprised to see these two being the ones that weren’t freed afterwards.

In a simple example like this, it’s easy to see how you could track down memory leaks and similar issues. However, I suspect in a complex codebase it could be quite a challenge to focus in on the impactful allocations with the amount of detail provided. I guess the main reason people would turn to this module is only to track down major memory leaks rather than a few KB here and there, so at that point perhaps the important allocations would stand out clearly from the background noise.

Either way, it’s certainly a welcome addition to the library!

Conclusions

Great stuff so far, but we’ve got plenty of library enhancements still to get through. I’ll discuss those and few other remaining details in the next post, and I’ll also sum up my overall thoughts on this release as a whole.


  1. So the parent process closes one end of the pipe and the child process closes the other end. If you want bidirectional communication you can do the same with another pipe, just the opposite way around. There are other ways for processes to communicate, of course, but this is one of the oldest. 

  2. If you want to get technical there’s a faster path used on platforms which support it which is to call ioctl() with either FIOCLEX or FIONCLEX to perform the same task. This is only because it’s generally a few percent faster than the equivalent fcntl() call, but less standard. 

  3. Or more concisely where is the lowest possible value from the median interval, is the size of the data set, is the number of items below the median interval, is the number of items within the median interval, and is the interval width. 

  4. Specifically from the Supplementary Ideographic Plane

4 Apr 2021 at 3:25PM in Software
 |   | 
Photo by David Clode on Unsplash