In this series looking at features introduced by every version of Python 3, this one is the first of two covering release 3.4. We look at a universal install of the pip
utility, improvements to handling codecs, and the addition of the asyncio
and enum
modules, among other things.
This is the 6th of the 32 articles that currently make up the “Python 3 Releases” series.
Python 3.4 was released on March 16 2014, around 18 months after Python 3.3. That means I’m only writing this around seven years late, as opposed to my Python 3.0 overview which was twelve years behind — at this rate I should be caught up in time for the summer.
This release was mostly focused on standard library improvements and there weren’t any syntax changes. There’s a lot here to like, however, including a bevy of new modules and a whole herd of enhancements to existing ones, so let’s fire up our Python 3.4 interpreters and import some info.
For anyone who’s unaware of pip
, is the most widely used package management tool for Python, its name being a recursive acronym for pip installs packages. Originally written by Ian Bicking, creator of virtualenv
, it was originally called pyinstall
and was written to be a more fully-featured alternative to easy_install
, which was the official package installation tool at the time.
Since pip
is the tool you naturally turn to for installation Python modules and tools, this always begs the question: how do you install pip
for the first time? Typically the answer has been to install some OS package with it in, and once you have it installed you can use it to install everything else. In the new release, however, there’s a new ensurepip
module to perform this bootstrapping operation. It uses a private copy of pip
that’s distributed with CPython, so it doesn’t require network access and can readily be used by anyone on any platform.
This approach is part of a wider standardisation effort around distributing Python packages, and pip
was selected as a tool that’s already popular and also works well within virtual environments. Speaking of which, this release also updates the venv
module to install pip
in virtual environments by default, using ensurepip
. This was something that virtualenv
always did, and the lack of it in venv
was a serious barrier to adoption of venv
for a number of people. Additionally the CPython installers on Windows and MacOS also default to installing pip
on these platforms. You can find full details in PEP 453.
When you try newer langauges like Go and Rust, coming from a heritage of C++ and the like, one of the biggest factors that leaps out at you isn’t so much the language itself but the convenience of the well integrated standard tooling. With this release I think Python has taken another step in this direction, with a standard and consistent package management on all the major platforms.
Under POSIX, file descriptors are by default inherited by child processes during a fork()
operation. This offers some concrete advantages, such as the child process automatically inheriting the stdin
, stdout
and stderr
from the parent, and also allowing the parent to create a pipe with pipe()
to communicate with the child process1.
However, this behaviour can cause confusion and bugs. For example, if the child process is a long-running daemon then this file descriptor may be held open indefinitely and the disk space associated with the file will not be freed. Or if the parent had a large number of open file descriptors, the child may exhaust the remaining space if it too tries to open a large number. This is one reason why it’s common to iterate over all file descriptors and call close()
on them after forking.
In Python 3.4, however, this behaviour has been modified so that file descriptors are not inherited. This is implemented by setting FD_CLOEXEC
on the descriptor via fcntl()
2 on POSIX systems, which closes all current file descriptors when any of the execX()
family are called. On Windows, SetHandleInformation()
is used passing HANDLE_FLAG_INHERIT
with much the same purpose.
Since inheritance of file descriptors is still desirable in some circumstances, the functions os.get_inheritable()
and os.set_inheritable()
can be used to query and set this behaviour on a per-filehandle basis. There are also os.get_handle_inheritable()
and os.set_handle_inheritable()
calls on Windows, if you’re using native Windows handles rather than the POSIX layer.
One important aspect to note here is that when using the FD_CLOEXEC
flag, the close()
happens on the execX()
call, so if you call a plain vanilla os.fork()
and continue execution in the same script then all the descriptors will still be open. To demonstrate the action of these methods, you’ll need to do something like this (which is Unix-specific since it assumes the existence of /tmp
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
When run, you should see something like the following:
ERROR: [Errno 9] Bad file descriptor
Contents of file:
Before fork
SECOND
That first line is the output from the first attempt to write the file, which fails. The contente of the output file clearly indicates the second write was successful.
In general I think this change is a very sensible one as the previous default behaviour of inheriting file descriptors by default on POSIX systems probably took a lot of less experienced developers (and a few more experienced ones!) by surprise. It’s the sort of nasty surprise that you don’t realise is there until those odd cases where, say, you’re dealing with hundreds of open files at once and when you spawn a child process it suddenly starts complaining it’s hit the system limit on open file descriptors and you wonder what on earth is going on. It always seems that such odd cases are always those when you have the tightest deadlines, too, so the last thing you need is to spend hours tracking down some weird file descriptor inheritance bug.
If you need to know more, PEP 446 has the lowdown, including references to real issues in various OSS projects caused by this behaviour.
The codecs
module has long been a fixture in Python, since it was introduced in (I think!) Python 2.0, released over two decades ago. It was intended as general framework for registering and using any sort of codec, and this can be seen from the diverse range of codecs it supports. For example, as well as obvious candidates like utf-8
and ascii
, you’ve got options like base64
, hex
, zlib
and bz2
. You can even register your own with codecs.register()
.
However, most people don’t use codecs
on a frequent basis, but they do use the convenience methods str.encode()
and bytes.decode()
all the time. This can cause confusion because while the encode()
and decode()
methods provided by codecs
are generic, the convenience methods on str
and bytes
are not — these only support the limited set of text encodings that make sense for those classes.
In Python 3.4 this situation has been somewhat improved by more helpful error messages and improved documentation.
Firstly, the methods codecs.encode()
and codecs.decode()
are now documented, which they weren’t previously. This is probably because they’re really they are just convenient wrappers for calling lookup()
and invoking the encoder object thus created, but unless you’re doing a lot of encoding/decoding with the same codec, the simplicity of their interface is probably preferable. Since these are C extension modules under the hood, there shouldn’t be a lot of performance overhead for using these wrappers either.
>>> import codecs
>>> encoder = codecs.lookup("rot13")
>>> encoder.encode("123hello123")
('123uryyb123', 11)
Secondly, using one of the non-text encodings without going through the codecs
module now yields a helpful error which points you in that direction.
>>> "123hello123".encode("rot13")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'rot13' is not a text encoding; use codecs.encode() to handle arbitrary codecs
Finally, errors during encoding now use chained exceptions to ensure that the codec responsible for them is indicated as well as the underlying error raised by that codec.
>>> codecs.decode("abcdefgh", "hex")
Traceback (most recent call last):
File "/Users/andy/.pyenv/versions/3.4.10/encodings/hex_codec.py", line 19, in hex_decode
return (binascii.a2b_hex(input), len(input))
binascii.Error: Non-hexadecimal digit found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: decoding with 'hex' codec failed (Error: Non-hexadecimal digit found)
Hopefully all this will go some way to making things easier to grasp for anyone grappling with the nuances of codecs in Python.
This release has a number of new modules, which are discussed in the sections below. I’ve skipped ensurepip
since it’s already been discussed at the top of this article.
This release contains the new asyncio
module which provides an event loop framework for Python. I’m not going to discuss it much in this article because I already covered it a few years ago in an article that was part of my coroutines series. The other reason not to go into things in too much detail here are that the situation evolved fairly rapidly from Python 3.4 to 3.7, so it probably makes more sense to have a more complete look in retrospect.
Briefly, it’s nominally the successor to the asyncore
module, for doing asynchronous I/O, which was always promising in priciple but a bit of a disappointment in practice due to a lack of flexibility. This is far from the whole story, however, as it also forms the basis for the modern use of coroutines in Python.
Since I’m writing these articles with the benefit of hindsight, my strong suggestion is to either go find some other good tutorials on asyncio
that were written in the last couple of years, and which use Python 3.7 as a basis; or wait until I get around to covering Python 3.7 myself, where I’ll run through in more detail (especially since my previous articles stopped at Python 3.5).
Enumerations are something that Python’s been lacking for some time. This is partly due to the fact that it’s not too hard to find ways to work around this omission, but they’re often a little unsatisfactory. It’s also partly due to the fact that nobody could fully agree on the best way to implement them.
Well in Python 3.4 PEP 435 has come along to change all that, and it’s a handy little addition.
Enumerations are defined using the same syntax as a class:
class WeekDay(Enum):
MONDAY = 1
TUESDAY = 2
WEDNESDAY = 3
THURSDAY = 4
FRIDAY = 5
SATURDAY = 6
SUNDAY = 7
However, it’s important to note that this isn’t actually a class, as it’s linked to the enum.EnumMeta
metaclass. Don’t worry too much about the details, just be aware that this is not a class but essentially a new construct that uses the same syntax as classes, and you won’t be taken by surprise later.
You’ll notice that all the enumeration members need to be assigned a value, you can’t just list the member names on their own (although read on for a nuance to this). When you have an enumeration value you can query both its name and value, and also str
and repr
have sensible values. See the excerpt below for an illustration of all these aspects.
>>> WeekDay.WEDNESDAY.name
'WEDNESDAY'
>>> WeekDay.WEDNESDAY.value
3
>>> str(WeekDay.FRIDAY)
'WeekDay.FRIDAY'
>>> repr(WeekDay.FRIDAY)
'<WeekDay.FRIDAY: 5>'
>>> type(WeekDay.FRIDAY)
<enum 'WeekDay'>
>>> type(WeekDay)
<class 'enum.EnumMeta'>
>>> WeekDay.THURSDAY - WeekDay.MONDAY
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'WeekDay' and 'WeekDay'
>>> WeekDay.THURSDAY.value - WeekDay.MONDAY.value
3
I did mention that every enumeration members need a name, but there is an enum.auto()
helper for you to automatically assign values if all you need is something unique. The excerpt below illustrates this as well as iterating through an enumeration.
>>> from enum import Enum, auto
>>> class Colour(Enum):
... RED = auto()
... GREEN = auto()
... BLUE = auto()
...
>>> print("\n".join(i.name + "=" + str(i.value) for i in Colour))
RED=1
GREEN=2
BLUE=3
Every enumeration name must be unique within a given enumeration definition, but the values can be duplicated if needed, which you can use to define aliases for values. If this isn’t desirable, the @enum.unique
decorator can enforce uniqueness, raising a ValueError
if not.
One thing that’s not immediately obvious from these examples is that enumeration member values may be any type and different types may even be mixed within the same enumeration. I’m not sure how valuable this would be to do in practive, however.
Values can be compared by identity or equality, but comparing enumeration members to their underlying types always returns not equal. Even when comparing by identity, aliases for the same underlying value compare equal. Also note that when iterating through enumerations, aliases are skipped and the first definition for each value is used.
>>> class Numbers(Enum):
... ONE = 1
... UN = 1
... EIN = 1
... TWO = 2
... DEUX = 2
... ZWEI = 2
...
>>> Numbers.ONE is Numbers.UN
True
>>> Numbers.TWO == Numbers.ZWEI
True
>>> Numbers.ONE == Numbers.TWO
False
>>> Numbers.ONE is Numbers.TWO
False
>>> Numbers.ONE == 1
False
>>> list(Numbers)
[<Numbers.ONE: 1>, <Numbers.TWO: 2>]
If you really do need to include aliases in your iteration, the special __members__
dictionary can be used for that.
>>> import pprint
>>> pprint.pprint(Numbers.__members__)
mappingproxy({'DEUX': <Numbers.TWO: 2>,
'EIN': <Numbers.ONE: 1>,
'ONE': <Numbers.ONE: 1>,
'TWO': <Numbers.TWO: 2>,
'UN': <Numbers.ONE: 1>,p
'ZWEI': <Numbers.TWO: 2>})
Finally, the module also provides some subclasses of Enum
which may be useful. For example, IntEnum
is one which adds the ability to compare enumeration values with int
as well as other enumeration values.
This is a bit of a whirlwind tour of what’s been written to be quite a flexible module, but hopefully if gives you an idea of its capabilities. Check out the full documentation for more details.
This release sees the addition of a new library pathlib
to manipulate filesystem paths, with semantics appropriate for different operating systems. This is intended to be a higher-level abstraction than that provided by the existing os.path
library, which itself has some functions to abstract away from the filesystem details (e.g. os.path.join()
which uses appropriate slashes to build a path).
There are common base classes across platforms, and then different subclasses for POSIX and Windows. The classes are also split into pure and concrete, where pure classes represent theoretical paths but lack any methods to interact with the concrete filesystem. The concrete equivalents have such methods, but can only be instantiated on the appropriate platform.
For reference, here is the class hierarchy:
When run on a POSIX system, the following excerpt illustrates which of the platform-specific classes can be instantiated, and also that the pure classes lack the filesystem methods that the concrete ones provide:
>>> import pathlib
>>> a = pathlib.PurePosixPath("/tmp")
>>> b = pathlib.PureWindowsPath("/tmp")
>>> c = pathlib.PosixPath("/tmp")
>>> d = pathlib.WindowsPath("/tmp")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/.pyenv/versions/3.4.10/pathlib.py", line 927, in __new__
% (cls.__name__,))
NotImplementedError: cannot instantiate 'WindowsPath' on your system
>>> c.exists()
True
>>> len(list(c.iterdir()))
24
>>> a.exists()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'exists'
>>> len(list(a.iterdir()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'iterdir'
Of course, a lot of the time you’ll just want whatever path represents the platform on which you’re running, so if you instantiate plain old Path
you’ll get the appropriate concrete representation.
>>> x = pathlib.Path("/tmp")
>>> type(x)
<class 'pathlib.PosixPath'>
One handy feature is that the division operator (slash) has been overridden so that you can append path elements with it. Note that this operator is the same on all platforms, and also you always use forward-slashes even on Windows. However, when you stringify the path, Windows paths will be given backslashes. The excerpt below illustrates these features, and also some of the manipulations that pure paths support.
>>> x = pathlib.PureWindowsPath("C:/") / "Users" / "andy"
>>> x
PureWindowsPath('C:/Users/andy')
>>> str(x)
'C:\\Users\\andy'
>>> x.parent
PureWindowsPath('C:/Users')
>>> [str(i) for i in x.parents]
['C:\\Users', 'C:\\']
>>> x.drive
'C:'
So far it’s pretty handy but perhaps nothing to write home about. However, there are some handy features. One is glob matching, where you can test a given path for matches against a glob-style pattern with the match()
method.
>>> x = pathlib.PurePath("a/b/c/d/e.py")
>>> x.match("*.py")
True
>>> x.match("d/*.py")
True
>>> x.match("a/*.py")
False
>>> x.match("a/*/*.py")
False
>>> x.match("a/*/*/*/*.py")
True
>>> x.match("d/?.py")
True
>>> x.match("d/??.py")
False
Then there’s relative_to()
which is handy for getting the relative path of a file to some specified parent directory. It also raises an exception if the path isn’t under the parent directory, which makes checking for errors in paths specified by the user more convenient.
>>> x = pathlib.PurePath("/one/two/three/four/five.py")
>>> x.relative_to("/one/two/three")
PurePosixPath('four/five.py')
>>> x.relative_to("/xxx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pathlib.py", line 819, in relative_to
.format(str(self), str(formatted)))
ValueError: '/one/two/three/four/five.py' does not start with '/xxx'
And finally there’s with_name()
, with_stem()
and with_suffix()
which are useful for making manipulations of parts of the filename.
>>> x = pathlib.PurePath("/home/andy/file.md")
>>> x.with_name("newfilename.html")
PurePosixPath('/home/andy/newfilename.html')
>>> x.with_stem("newfile")
PurePosixPath('/home/andy/newfile.md')
>>> x.with_suffix(".html")
PurePosixPath('/home/andy/file.html')
>>> x.with_suffix("")
PurePosixPath('/home/andy/file')
The concrete classes add a lot more useful functionality for querying the content of directories and reading file ownership and metadata, but if you want more details I suggest you go read the excellent documentation. If you want the motivations behind some of the design decisions, go and read PEP 428.
Both simple and useful, this new module contains some handy functions to calculate basic statistical measures from sets of data. All of these operations support the standard numeric types int
, float
, Decimal
and Fraction
and raise StatisticsError
on errors, such as an empty data set being passed.
The following functions for determining different forms of average value are provided in this release:
mean()
sum(data) / len(data)
except supporting generalised iterators that can only be evaluated once and don’t support len()
.median()
data[len(data) // 2]
except supporting generalised iterators. Also, if the number of items in data
is even then the mean of the two middle items is returned instead of selecting one of them, so the value is not necessarily one of the actual members of the data set in this case.median_low()
and median_high()
median()
and each other for data sets with an odd number of elements. If the number of elements is even, these return one of the two middle elements instead of their mean as median()
does, with median_low()
returning the lower of the two and median_high()
the higher.median_grouped()
1
, which would represent continuous values that have been rounded to the nearest integer. The method involes identifying the median interval, and then using the proportion of values above and within that interval to interpolate an estimate of the median value within it3.mode()
StatisticsError
if there’s more than one value with equal-highest cardinality.There are also functions to calculate the variance and standard deviation of the data:
pstdev()
and stdev()
pvariance()
and variance()
These operations are generally fairly simple to implement yourself, but making them operate correctly on any iterator is slightly fiddly and it’s definitely handy to have them available in the standard library. I also have a funny feeling that we’ll be seeing more additions to this library in the future beyond the fairly basic set that’s been included initially.
As you can probably tell from the name, this module is intended to help you track down where memory is being allocated in your scripts. It does this by storing the line of code that allocated every block, and offering APIs which allow your code to query which files or lines of code have allocated the most blocks, and also compare snapshots between two points in time so you can track down the source of memory leaks.
Due to the memory and CPU overhead of performing this tracing it’s not enabled by default. You can start tracking at runtime with tracemalloc.start()
, or to start it early you can pass the PYTHONTRACEMALLOC
environment variable or -X tracemalloc
command-line option. You can also store multiple frames of traceback against each block, at the cost of increased CPU and memory overhead, which can be helpful for tracing the source of memory allocations made by common shared code.
Once tracing is enabled you can grab a snapshot at any point with take_snapshot()
, which returns a Snapshot
instance which can be interrogated for information at any later point. Once you have a Snapshot
instance you can call statistics()
on it to get the memory allocations aggregated by source file, or broken down by line number of specific backtrace. There’s also a compare_to()
method for examining the delta in memory allocations between two points, and there are dump()
and load()
methods for saving snapshots to disk for later analysis, which could be useful for tracing code in production environments.
As a quick example of these two methods, consider the following completely artificial code:
memory.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
lib1.py | |
---|---|
1 2 3 4 5 |
|
lib2.py | |
---|---|
1 2 3 4 |
|
lib3.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Let’s take a quick look at the two parts of the output that executing memory.py
gives us. The first half that I get on my MacOS system is shown below — wherever you see “...
” it’s where I’ve stripped out leading paths to avoid the need for word wrapping:
---- Initial snapshot:
.../lib3.py:10: size=105 KiB, count=101, average=1068 B
memory.py:10: size=48.1 KiB, count=1, average=48.1 KiB
memory.py:9: size=12.0 KiB, count=1, average=12.0 KiB
.../lib1.py:5: size=10.3 KiB, count=102, average=104 B
.../lib3.py:11: size=848 B, count=1, average=848 B
memory.py:13: size=536 B, count=2, average=268 B
.../python3.4/random.py:253: size=536 B, count=1, average=536 B
memory.py:12: size=56 B, count=1, average=56 B
memory.py:11: size=56 B, count=1, average=56 B
.../lib3.py:6: size=32 B, count=1, average=32 B
.../lib2.py:3: size=32 B, count=1, average=32 B
I’m not going to go through all of these, but let’s pick a few examples to check what we’re seeing makes sense. Note that the results from statistics()
are always sorted in decreasing order of total memory consumption.
The first line indicates lib3.py:10
allocated memory 101 times, which is reassuring because it’s not allocating every time around the nested loop. Interesting to note that it’s one more time than the number of times around the outer loop, however, which perhaps implies there’s some allocation that was done the first time and then reused. The average allocation of 1068 bytes makes sense, since these are str
objects of 1024 characters and based on sys.getsizeof("")
on my platform each instance has an overhead of around 50 bytes.
Next up are memory.py:10
and memory.py:9
which are straightforward enough: single allocations for single strings. The sizes are such that the str
overhead is lost in rounding errors, but do note that the string using extended Unicode characters4 requires 4 bytes per character and is therefore four times larger than the byte-per-character ASCII one. If you’ve read the earlier articles in this series, you may recall that this behaviour was introduced in Python 3.3.
Skipping forward slightly, the allocation on lib3.py:11
is interesting: when we append the str
we’ve built to the list we get a single allocation of 848 bytes. I assume there’s some optimisation going on here, because if I increase the loop count the allocation count remains at one but the size increases.
The last thing I’ll call out is the two allocations on memory.py:13
. I’m not quite sure exactly what’s triggering this, but it’s some sort of optimisation — even if the loop has zero iterations then these allocations still occur, but if I comment out the loop entirely then these allocations disappear. Fascinating stuff!
Now we’ll look at the second half the output, comparing the initial snapshot to that after the class instances are deleted:
---- Incremental snapshot:
.../lib3.py:10: size=520 B (-105 KiB), count=1 (-100), average=520 B
.../lib1.py:5: size=0 B (-10.3 KiB), count=0 (-102)
.../python3.4/tracemalloc.py:462: size=1320 B (+1320 B), count=3 (+3), average=440 B
.../python3.4/tracemalloc.py:207: size=952 B (+952 B), count=3 (+3), average=317 B
.../python3.4/tracemalloc.py:165: size=920 B (+920 B), count=3 (+3), average=307 B
.../lib3.py:11: size=0 B (-848 B), count=0 (-1)
.../python3.4/tracemalloc.py:460: size=672 B (+672 B), count=1 (+1), average=672 B
.../python3.4/tracemalloc.py:432: size=520 B (+520 B), count=2 (+2), average=260 B
memory.py:18: size=472 B (+472 B), count=1 (+1), average=472 B
.../python3.4/tracemalloc.py:53: size=472 B (+472 B), count=1 (+1), average=472 B
.../python3.4/tracemalloc.py:192: size=440 B (+440 B), count=1 (+1), average=440 B
.../python3.4/tracemalloc.py:54: size=440 B (+440 B), count=1 (+1), average=440 B
.../python3.4/tracemalloc.py:65: size=432 B (+432 B), count=6 (+6), average=72 B
.../python3.4/tracemalloc.py:428: size=432 B (+432 B), count=1 (+1), average=432 B
.../python3.4/tracemalloc.py:349: size=208 B (+208 B), count=4 (+4), average=52 B
.../python3.4/tracemalloc.py:487: size=120 B (+120 B), count=2 (+2), average=60 B
memory.py:16: size=90 B (+90 B), count=2 (+2), average=45 B
.../python3.4/tracemalloc.py:461: size=64 B (+64 B), count=1 (+1), average=64 B
memory.py:13: size=480 B (-56 B), count=1 (-1), average=480 B
.../python3.4/tracemalloc.py:275: size=56 B (+56 B), count=1 (+1), average=56 B
.../python3.4/tracemalloc.py:189: size=56 B (+56 B), count=1 (+1), average=56 B
memory.py:12: size=0 B (-56 B), count=0 (-1)
memory.py:11: size=0 B (-56 B), count=0 (-1)
.../python3.4/tracemalloc.py:425: size=48 B (+48 B), count=1 (+1), average=48 B
.../python3.4/tracemalloc.py:277: size=32 B (+32 B), count=1 (+1), average=32 B
.../lib3.py:6: size=0 B (-32 B), count=0 (-1)
.../lib2.py:3: size=0 B (-32 B), count=0 (-1)
memory.py:10: size=48.1 KiB (+0 B), count=1 (+0), average=48.1 KiB
memory.py:9: size=12.0 KiB (+0 B), count=1 (+0), average=12.0 KiB
.../python3.4/random.py:253: size=536 B (+0 B), count=1 (+0), average=536 B
Firstly, there are of course a number of allocations within tracemalloc.py
, which are the result of creating and analysing the previous snapshot. We’ll disregard these, because they depend on the details of the library implementation which we don’t have transparency into here.
Beyond this, most of the changes are as you’d expect. Interesting points to note are that one of the allocations lib3.py:10
was not freed, and only one of the two allocations from memory.py:13
was freed. Since these were the two cases where I was a little puzzled by the apparently spurious additional allocations, I’m not particularly surprised to see these two being the ones that weren’t freed afterwards.
In a simple example like this, it’s easy to see how you could track down memory leaks and similar issues. However, I suspect in a complex codebase it could be quite a challenge to focus in on the impactful allocations with the amount of detail provided. I guess the main reason people would turn to this module is only to track down major memory leaks rather than a few KB here and there, so at that point perhaps the important allocations would stand out clearly from the background noise.
Either way, it’s certainly a welcome addition to the library!
Great stuff so far, but we’ve got plenty of library enhancements still to get through. I’ll discuss those and few other remaining details in the next post, and I’ll also sum up my overall thoughts on this release as a whole.
So the parent process closes one end of the pipe and the child process closes the other end. If you want bidirectional communication you can do the same with another pipe, just the opposite way around. There are other ways for processes to communicate, of course, but this is one of the oldest. ↩
If you want to get technical there’s a faster path used on platforms which support it which is to call ioctl()
with either FIOCLEX
or FIONCLEX
to perform the same task. This is only because it’s generally a few percent faster than the equivalent fcntl()
call, but less standard. ↩
Or more concisely \( L + \frac{(\frac{n}{2})-B}{G} w \) where \(L\) is the lowest possible value from the median interval, \(n\) is the size of the data set, \(B\) is the number of items below the median interval, \(G\) is the number of items within the median interval, and \(w\) is the interval width. ↩
Specifically from the Supplementary Ideographic Plane. ↩