I was running a piece of code that unexpectedly gave a logic error at one part of the program. When investigating the section, I created a test file to test the set of statements being run and found out an unusual bug that seems very odd.
I tested this simple code:
array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original to something else
print(list(f)) # Outputs filtered
And the output was:
>>> []
Yes, nothing. I was expecting the filter comprehension to get items in the array with a count of 2 and output this, but I didn't get that:
# Expected output
>>> [2, 2]
When I commented out the third line to test it once again:
array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
### array = [5, 6, 1, 2, 9] # Ignore line
print(list(f)) # Outputs filtered
The output was correct (you can test it for yourself):
>>> [2, 2]
At one point I outputted the type of the variable f
:
array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original
print(type(f))
print(list(f)) # Outputs filtered
And I got:
>>> <class 'generator'>
>>> []
Why is updating a list in Python changing the output of another generator variable? This seems very odd to me.
Generators are lazy, they won't be evaluated until you iterate through them. In this case that's at the point you create the
list
with the generator as input, at theprint
.Generator evaluation is "lazy" -- it doesn't get executed until you actualize it with a proper reference. With your line:
Look again at your output with the type of
f
: that object is a generator, not a sequence. It's waiting to be used, an iterator of sorts.Your generator isn't evaluated until you start requiring values from it. At that point, it uses the available values at that point, not the point at which it was defined.
Code to "make it work"
That depends on what you mean by "make it work". If you want
f
to be a filtered list, then use a list, not a generator:The root cause of the problem is that generators are lazy; variables are evaluated each time:
It iterates over the original list and evaluates the condition with the current list. In this case, 4 appeared twice in the new list, causing it to appear in the result. It only appears once in the result because it only appeared once in the original list. The 6s appear twice in the new list, but never appear in the old list and are hence never shown.
Full function introspection for the curious (the line with the comment is the important line):
To reiterate: The list to be iterated is only loaded once. Any closures in the condition or expression, however, are loaded from the enclosing scope each iteration. They are not stored in a constant.
The best solution for your problem would be to create a new variable referencing the original list and use that in your generator expression,.
Python's generator expressions are late binding (see PEP 289 -- Generator Expressions) (what the other answers call "lazy"):
That means it only evaluates the outermost
for
when creating the generator expression. So it actually binds the value with the namearray
in the "subexpression"in array
(in fact it's binding the equivalent toiter(array)
at this point). But when you iterate over the generator theif array.count
call actually refers to what is currently namedarray
.Since it's actually a
list
not anarray
I changed the variable names in the rest of the answer to be more accurate.In your first case the
list
you iterate over and thelist
you count in will be different. It's as if you used:So you check for each element in
list1
if its count inlist2
is two.You can easily verify this by modifying the second list:
If it iterated over the first list and counted in the first list it would've returned
[2, 2]
(because the first list contains two2
). If it iterated over and counted in the second list the output should be[1, 1]
. But since it iterates over the first list (containing one1
) but checks the second list (which contains two1
s) the output is just a single1
.Solution using a generator function
There are several possible solutions, I generally prefer not to use "generator expressions" if they aren't iterated over immediately. A simple generator function will suffice to make it work correctly:
And then use it like this:
Note that the PEP (see the link above) also states that for anything more complicated a full generator definition is preferrable.
A better solution using a generator function with a Counter
A better solution (avoiding the quadratic runtime behavior because you iterate over the whole array for each element in the array) would be to count (
collections.Counter
) the elements once and then do the lookup in constant time (resulting in linear time):Appendix: Using a subclass to "visualize" what happens and when it happens
It's quite easy to create a
list
subclass that prints when specific methods are called, so one can verify that it really works like that.In this case I just override the methods
__iter__
andcount
because I'm interested over which list the generator expression iterates and in which list it counts. The method bodies actually just delegate to the superclass and print something (since it usessuper
without arguments and f-strings it requires Python 3.6 but it should be easy to adapt for other Python versions):This is a simple subclass just printing when the
__iter__
andcount
method are called:You are not using a generator correctly if this is the primary use of this code. Use a list comprehension instead of a generator comprehension. Just replace the parentheses with brackets. It evaluates to a list if you don't know.
You are getting this response because of the nature of a generator. You're calling the generator when it't contents will evaluate to
[]
Generators are lazy and your newly defined
array
is used when you exhaust your generator after redefining. Therefore, the output is correct. A quick fix is to use a list comprehension by replacing parentheses()
by brackets[]
.Moving on to how better to write your logic, counting a value in a loop has quadratic complexity. For an algorithm that works in linear time, you can use
collections.Counter
to count values, and keep a copy of your original list:Notice the second version doesn't even require
old_array
and is useful if there is no need to maintain ordering of values in your original array.