How to remove duplicate items from a list using li

2019-01-09 10:31发布

问题:

How to remove duplicate items from a list using list comprehension? I have following code:

a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
b = []
b = [item for item in a if item not in b]

but it doesn't work, just produces identical list. Why its producing an identical list?

回答1:

It's producing an identical list as b contains no elements at run-time. What you'd want it this:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = []
>>> [b.append(item) for item in a if item not in b]
[None, None, None, None, None, None, None, None]
>>> b
[1, 2, 3, 5, 9, 6, 8, 7]


回答2:

If you don't mind using a different technique than list comprehension you can use a set for that:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = list(set(a))
>>> print b
[1, 2, 3, 5, 6, 7, 8, 9]


回答3:

The reason that the list is unchanged is that b starts out empty. This means that if item not in b is always True. Only after the list has been generated is this new non-empty list assigned to the variable b.



回答4:

Use keys on a dict constructed with values in a as its keys.

b = dict([(i, 1) for i in a]).keys()

Or use a set:

b = [i for i in set(a)]


回答5:

Use groupby:

>>> from itertools import groupby
>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> [k for k, _ in groupby(sorted(a, key=lambda x: a.index(x)))]
[1, 2, 3, 5, 9, 6, 8, 7]

Leave out the key argument if you don't care about which order the value first appeared in the original list, e.g.

>>> [k for k, _ in groupby(sorted(a))]
[1, 2, 3, 5, 6, 7, 8, 9]

You can do some cool things with groupby. To identify items that appear multiple times:

>>> [k for k, v in groupby(sorted(a)) if len(list(v)) > 1]
[2, 3, 5, 8]

Or to build up a frequency dictionary:

>>> {k: len(list(v)) for k, v in groupby(sorted(a))}
{1: 1, 2: 3, 3: 4, 5: 4, 6: 1, 7: 1, 8: 2, 9: 1}

There are some very useful functions in the itertools module: chain, tee and product to name a few!



回答6:

>>> a = [10,20,30,20,10,50,60,40,80,50,40,0,100,30,60]
>>> [a.pop(a.index(i, a.index(i)+1)) for i in a if a.count(i) > 1]
>>> print(a)