Python’s simple scoping rules occasionally hide some surprising behaviour.
Scoping in Python is pretty simple, especially in Python 2.x. Essentially you have three scopes:
Local scope is anything defined in the same function as you. Enclosing scopes are those of the functions in which you’re defined — this only applies to functions which are lexically contained within other functions1. Global scope is anything at the module level. There’s also a special “builtin” scope outside of that, but let’s ignore that for now. Classes also have their own special sorts of scopes, but we’ll ignore that as well.
When you assign to a variable within a function, this counts as a declaration
and the variable is created in the local scope2 of the function. This
is unless you use the global
keyword to force the variable to refer to one at
module scope instead3.
When you read the value of a variable, Python starts with the local scope and attempts to look up the name there. If it’s not found, it recurses up through the enclosing scopes looking for it until it reaches the module scope (and finally the magic builtin scope). This is more or less as you’d expect if you’re used to normal lexically-scoped languages.
However, if you were paying attention you’ll notice that I specifically said
that a local scope is defined by a function. In particular, constructs such as
for
loops do not define their own scopes — they operate entirely in the
local scope of the enclosing function (or module). This has some beneficial
side-effects — for example, loop counters are still available once the loop has
exited, which is rather handy. It has some potential pitfalls — take this code
snippet, for example4:
1 2 |
|
So, this builds a list of functions5 and then executes each one in turn
and concatenates and prints the results. Intuitively one would expect the
results to be 0 1 2 3 4
, but actually we get 4 4 4 4 4
— eh?
What’s happening is that each of the functions created is in a closure with the
variable i
in its global scope bound to the one used in the loop. However,
each iteration just updates the same loop counter in the local scope of
the enclosing function (or module) and so all the functions end up with a
reference to the same variable i
. In other words, closures in Python refer
directly to the enclosing scopes, they don’t create “frozen copies” of
them6.
This works fine when a closure is created by a function and
then returned, because the enclosing scope is then kept alive only by the
closure and inaccessible elsewhere. Further invocations of the same function
will produce new scopes and different closures. In this case, though, the
functions are all defined under the same scope. So when they’re evaluated, they
all return the final value of i
as it was when the loop terminated.
We can illustrate this by amending the example to delete the loop counter:
1 2 3 |
|
Now the third line raises an exception:
NameError: global name 'i' is not defined
Of course, if you use the generator expression form to defer generation of the functions until the point of invocation then everything works as you’d expect:
1 2 3 |
|
So, all this is quite comprehensible once you understand what’s going on, but I do wonder how many people get bitten by this sort of thing when using closures in loops.
As a final note, this behaviour is the same in Python 3.x. There is a small
difference with regards to scopes that is the addition of the nonlocal
keyword which is the equivalent of global
except it allows updating the value
of variables in enclosing scopes which are between the local and global scopes.
I believe that with regards to reading the values of such variables, however,
the behaviour is unchanged.
Note that this is a lexical definition of enclosure, which is to say it’s to do with where the function is defined. It’s nothing to do with where the function was called from. Unlike dynamically-scoped languages, Python gives a function no access to variables defined in the scope of a calling function. ↩
This actually extends to the entire function, which is why it’s an error to read the value of a variable assigned to later in the function even if it exists in an enclosing scope. ↩
Or the nonlocal
keywords in Python 3.x — see the note at the end
of this post. ↩
This example uses a list comprehension for concision, but the
issues described would apply equally to a for
loop. ↩
Yes I’m using lambda
— so sue me, it’s just an example. ↩
Actually, once you think of closures as references to a scope
rather than some sort of “freeze-frame” of the state, some things are easier to
understand. For example, if two functions are defined in the same closure,
updates that each of them makes to the state can be felt by the other. This is
especially relevant if they use Python 3’s nonlocal
keyword (see the note at
the end this post). ↩