Say I have a list in Python 3.X. I use list comprehension to return a subset of that list--- is there an easy/Pythonic way to "pop" that subset off of the original list (so the elements in that subset are no longer in the original list after being returned)? Thanks!
Example (I need help in defining my_func):
a = [1, 2, 2, 3, 4, 5]
a, b = my_func(a, *kwargs)
I'd then want:
a = [1, 2, 2]
b = [3, 4, 5]
Note: I do not want to pop the values off one at a time, but rather the whole subset at once. "Pop" may not be the best terminology, but I don't know what is.
The best way I can think of doing it is:
temp = [idx for idx, val in enumerate(a) if val > 2]
b = [a[i] for i in temp]
a = [val for idx,val in enumerate(a) if idx not in temp]
Obviously, I would prefer something more elegant and efficient.
I want to avoid going over the list twice.
EDIT: I think this question is unique because I am not simply concerned with splitting the list - I want to modify the original list. Splitting and assigning one of those splits to the original list variable is a possible solution, but I was hoping there might be a way to do it without explicitly assigning to the original list variable (similar to something like b.append(a.pop()))
Just filter out the unwanted items using list comprehension:
a = [1, 2, 2, 3, 4, 5]
condition = lambda x: x > 2
b = [x for x in a if condition(x)] # b == [3, 4, 5]
a = [x for x in a if not condition(x)] # a == [1, 2, 2]
Update
If you are worrying about efficiency then here is another approach which scans the list only once:
def classify(iterable, condition):
""" Returns a list that matches the condition and a list that
does not """
true_list = []
false_list = []
for item in iterable:
if condition(item):
true_list.append(item)
else:
false_list.append(item)
return true_list, false_list
a = [1, 2, 2, 3, 4, 5]
b, a = classify(a, lambda x: x > 2)
print(a) # [1, 2, 2]
print(b) # [3, 4, 5]
Update 2
I did not realize that my update looks almost the same to that of user3467349, but believe it or not, I did not cheat :-)
Keep in mind this is Python, not C, and sometimes the way it works is not necessarily intuitive. This means you have to verify your assumptions. I've done this using IPython's built-in %%timeit
magic.
a = [1,2,2,3,4,5]
%timeit b=[x for x in a if x > 2]\
1000000 loops, best of 3: 362 ns per loop
%timeit c=[x for x in a if not (x > 2)]
1000000 loops, best of 3: 371 ns per loop
So just under 800 ns for both, but we're iterating over the loop twice. Surely we don't need to do that? How about a couple of the above methods? Let's start with classify, which is pretty straightforward and only walks over the list once:
%timeit b, a = classify(a, lambda x: x > 2)
1000000 loops, best of 3: 1.89 µs per loop
Wow, even though it only walks over the loop once, it takes more than twice as long as the simple solution above, which walks it twice. Let's try a slight variation on the other solution proposed above:
%%timeit
b, c = [], []
for x in a:
b.append(x) if x > 2 else a.append(x)
1000000 loops, best of 3: 1.2 µs per loop
Better, but it's still slower than our 'naive'/'inefficient' implementation. Maybe a little different formulation would be better:
%%timeit
b, c = [], []
for x in a:
if x > 2:
b.append(x)
else:
c.append(x)
1000000 loops, best of 3: 983 ns per loop
Hmm, that seems almost the same, but is a little faster. Still doesn't beat the naive implementation. Let's get really clever, maybe make it even faster:
%%timeit
b, c = [], []
for x in a:
(b, c)[x > 2].append(x)
1000000 loops, best of 3: 1.28 µs per loop
So, what we saw is that despite our desire not to iterate the loop twice, we don't seem to be able to improve on just doing two list comprehensions. List comprehensions are sort of 'under the hood' in Python, and so for many things they are going to faster than anything you come up with.
Now, the x < 2
comparison is cheap - no function calls, simple math. There will be a point where it starts to make sense. Let's create a more expensive comparison function - we'll calculate the factorial (and do it inefficiently):
def factorial(x):
if x in (0,1):
return 1
else:
return x * factorial(x-1)
%timeit b = [x for x in a if factorial(x) > 6]
100000 loops, best of 3: 3.47 µs per loop
%timeit c = [x for x in a if not factorial(x) > 6]
100000 loops, best of 3: 3.53 µs per loop
So, obviously the times went up a good bit - now are at about 7uS for everything.
Let's try some of our other examples now:
%timeit b, c = classify(a, lambda x: factorial(x) > 6)
100000 loops, best of 3: 5.05 µs per loop
%%timeit
b, c = [], []
for x in a:
if factorial(x) > 6:
b.append(x)
else:
c.append(x)
100000 loops, best of 3: 4.01 µs per loop
%%timeit
b, c = [], []
for x in a:
(b, c)[factorial(x) > 6].append(x)
100000 loops, best of 3: 4.59 µs per loop
The lesson from all of this: When dealing with python & efficiency, it's usually a good idea to just try it out in a console, but very often the naive implementations are reasonably performant & the easiest to read. And a quick test will tell you if trying to optimize is actually worth it; you can often make it less readable AND slower if you're not careful....
Addendum: Someone commented that we need longer lists, because we're measuring function call overhead more than performance. They have a good point, but the times show about the same relationship on my machine:
In [16]: a = range(100000)
In [17]: random.shuffle(a)
In [18]: %timeit b = [x for x in a if x > 50000]
100 loops, best of 3: 5.2 ms per loop
In [19]: %timeit c = [x for x in m if not x > 50000]
100 loops, best of 3: 5.18 ms per loop
In [20]: %timeit c = [x for x in m if x <= 50000]
100 loops, best of 3: 5.35 ms per loop
In [21]: %%timeit
....: b, c = [], []
....: for x in a:
....: if x > 50000:
....: b.append(x)
....: else:
....: c.append(x)
....:
100 loops, best of 3: 12.7 ms per loop
Note that if I change the comparison to x > 2 (instead of x > 50000) the second loop sped up to about 11.4ms. But still, that's barely any faster than the naive implementation. That said, it's probably the one I'd prefer - it's still easy to read, and it's not slower (or significantly so). And code tends to get more complex over time, so when you later add another comparison, or change it to a function call, it's easier to do, and the performance will suffer less in this version than the naive version.
But again: This isn't to say 'use list comprehensions' (though they are very often a good idea), but verify your assumptions.
The one line solution (please don't actually use this):
a = [7,4,2, 1, 5 , 11, 2]
[x for x in (a.pop() for i in range(len(a))) if (lambda x: True if x> 2 \
else a.insert(0, x))(x)]
More Reasonable Solutions
In general - if you're checking to see what a value is before popping - that's an read at an index, and then a pop (not to mention the funkiness of keeping tracking of iterating over an array of changing length), which seems to make the point of not having two new lists moot.
So I would just add to one of two new lists i.e.
a = [1, 2, 2, 3, 4, 5]
b=[]; c=[];
for i in range(len(a)):
if x > 2: b.append(x)
else: c.append(x)
Of course you can technically pop from a and insert back into a you would do a.insert(0, x)
instead of c.append(x)
- but it's generally a bad idea to manipulate a list you are looping over.
Another alternative is a sorted list and a bisect i.e.
a = sorted(a)
ind = bisect.bisect(a, 2)
#Now both lists are here
a, b= a[:ind], a[ind:]
Which method is preferable really depends on what you plan to do with the lists afterwards.
If you want to 'pop' the values from the list a
and append those values to b I would follow this approach:
a = [1,2,2,3,4,5]
b = []
for i in a[:]:
if i > 2:
b.append(i)
a.remove(i)
#a = [1,2,2]
#b = [3,4,5]
This script is especially helpful if list a
was not in a particular order.