Get unique values from a list in python [duplicate

2018-12-31 14:15发布

This question already has an answer here:

I want to get the unique values from the following list:

[u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']

The output which I require is:

[u'nowplaying', u'PBS', u'job', u'debate', u'thenandnow']

This code works:

output = []
for x in trends:
    if x not in output:
        output.append(x)
print output

is there a better solution I should use?

标签: python
30条回答
余生请多指教
2楼-- · 2018-12-31 14:28

As a bonus, Counter is a simple way to get both the unique values and the count for each value:

from collections import Counter
l = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
c = Counter(l)
查看更多
美炸的是我
3楼-- · 2018-12-31 14:29

First declare your list properly, separated by commas. You can get the unique values by converting the list to a set.

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
myset = set(mylist)
print myset

If you use it further as a list, you should convert it back to list by doing:

mynewlist = list(myset)

Another possibility, probably faster would be to use a set from the beginning, instead of a list. Then your code should be:

output = set()
for x in trends:
    output.add(x)
print output

As it has been pointed out, the sets do not maintain the original order. If you need so, you should look up about the ordered set.

查看更多
妖精总统
4楼-- · 2018-12-31 14:29

If we need to keep the elements order, how about this:

used = set()
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = [x for x in mylist if x not in used and (used.add(x) or True)]

And one more solution using reduce and without the temporary used var.

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])

UPDATE - Oct 1, 2016

Another solution with reduce, but this time without .append which makes it more human readable and easier to understand.

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])
#which can also be writed as:
unique = reduce(lambda l, x: l if x in l else l+[x], mylist, [])

NOTE: Have in mind that more human-readable we get, more unperformant the script is.

import timeit

setup = "mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']"

#10x to Michael for pointing out that we can get faster with set()
timeit.timeit('[x for x in mylist if x not in used and (used.add(x) or True)]', setup='used = set();'+setup)
0.4188511371612549

timeit.timeit('[x for x in mylist if x not in used and (used.append(x) or True)]', setup='used = [];'+setup)
0.8063139915466309

timeit.timeit('reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])', setup=setup)
2.216820001602173

timeit.timeit('reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])', setup=setup)
2.948796033859253

timeit.timeit('reduce(lambda l, x: l if x in l else l+[x], mylist, [])', setup=setup)
2.9785239696502686

ANSWERING COMMENTS

Because @monica asked a good question about "how is this working?". For everyone having problems figuring it out. I will try to give a more deep explanation about how this works and what sorcery is happening here ;)

So she first asked:

I try to understand why unique = [used.append(x) for x in mylist if x not in used] is not working.

Well it's actually working

>>> used = []
>>> mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
>>> unique = [used.append(x) for x in mylist if x not in used]
>>> print used
[u'nowplaying', u'PBS', u'job', u'debate', u'thenandnow']
>>> print unique
[None, None, None, None, None]

The problem is that we are just not getting the desired results inside the unique variable, but only inside the used variable. This is because during the list comprehension .append modifies the used variable and returns None.

So in order to get the results into the unique variable, and still use the same logic with .append(x) if x not in used, we need to move this .append call on the right side of the list comprehension and just return x on the left side.

But if we are too naive and just go with:

>>> unique = [x for x in mylist if x not in used and used.append(x)]
>>> print unique
[]

We will get nothing in return.

Again, this is because the .append method returns None, and it this gives on our logical expression the following look:

x not in used and None

This will basically always:

  1. evaluates to False when x is in used,
  2. evaluates to None when x is not in used.

And in both cases (False/None), this will be treated as falsy value and we will get an empty list as a result.

But why this evaluates to None when x is not in used? Someone may ask.

Well it's because this is how Python's short-circuit operators works.

The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.

So when x is not in used (i.e. when its True) the next part or the expression will be evaluated (used.append(x)) and its value (None) will be returned.

But that's what we want in order to get the unique elements from a list with duplicates, we want to .append them into a new list only when we they came across for a fist time.

So we really want to evaluate used.append(x) only when x is not in used, maybe if there is a way to turn this None value into a truthy one we will be fine, right?

Well, yes and here is where the 2nd type of short-circuit operators come to play.

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

We know that .append(x) will always be falsy, so if we just add one or next to him, we will always get the next part. That's why we write:

x not in used and (used.append(x) or True)

so we can evaluate used.append(x) and get True as a result, only when the first part of the expression (x not in used) is True.

Similar fashion can be seen in the 2nd approach with the reduce method.

(l.append(x) or l) if x not in l else l
#similar as the above, but maybe more readable
#we return l unchanged when x is in l
#we append x to l and return l when x is not in l
l if x in l else (l.append(x) or l)

where we:

  1. Append x to l and return that l when x is not in l. Thanks to the or statement .append is evaluated and l is returned after that.
  2. Return l untouched when x is in l
查看更多
浅入江南
5楼-- · 2018-12-31 14:29

I am surprised that nobody so far has given a direct order-preserving answer:

def unique(sequence):
    """Generate unique items from sequence in the order of first occurrence."""
    seen = set()
    for value in sequence:
        if value in seen:
            continue

        seen.add(value)

        yield value

It will generate the values so it works with more than just lists, e.g. unique(range(10)). To get a list, just call list(unique(sequence)), like this:

>>> list(unique([u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']))
[u'nowplaying', u'PBS', u'job', u'debate', u'thenandnow']

It has the requirement that each item is hashable and not just comparable, but most stuff in Python is and it is O(n) and not O(n^2), so will work just fine with a long list.

查看更多
萌妹纸的霸气范
6楼-- · 2018-12-31 14:31

set - unordered collection of unique elements. List of elements can be passed to set's constructor. So, pass list with duplicate elements, we get set with unique elements and transform it back to list then get list with unique elements. I can say nothing about performance and memory overhead, but I hope, it's not so important with small lists.

list(set(my_not_unique_list))

Simply and short.

查看更多
谁念西风独自凉
7楼-- · 2018-12-31 14:32

I know this is an old question, but here's my unique solution: class inheritance!:

class UniqueList(list):
    def appendunique(self,item):
        if item not in self:
            self.append(item)
            return True
        return False

Then, if you want to uniquely append items to a list you just call appendunique on a UniqueList. Because it inherits from a list, it basically acts like a list, so you can use functions like index() etc. And because it returns true or false, you can find out if appending succeeded (unique item) or failed (already in the list).

To get a unique list of items from a list, use a for loop appending items to a UniqueList (then copy over to the list).

Example usage code:

unique = UniqueList()

for each in [1,2,2,3,3,4]:
    if unique.appendunique(each):
        print 'Uniquely appended ' + str(each)
    else:
        print 'Already contains ' + str(each)

Prints:

Uniquely appended 1
Uniquely appended 2
Already contains 2
Uniquely appended 3
Already contains 3
Uniquely appended 4

Copying to list:

unique = UniqueList()

for each in [1,2,2,3,3,4]:
    unique.appendunique(each)

newlist = unique[:]
print newlist

Prints:

[1, 2, 3, 4]
查看更多
登录 后发表回答