可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a list of non-unique strings:
list = ["a", "b", "c", "a", "a", "d", "b"]
I would like to replace each element with an integer key which uniquely identifies each string:
list = [0, 1, 2, 0, 0, 3, 1]
The number does not matter, as long as it is a unique identifier.
So far all I can think to do is copy the list to a set, and use the index of the set to reference the list. I'm sure there's a better way though.
回答1:
This will guarantee uniqueness and that the id's are contiguous starting from 0
:
id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]
On a different note, you should not use 'list'
as variable name because it will shadow the built-in type list
.
回答2:
Here's a single pass solution with defaultdict:
from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen) # you could instead bind to seen.__len__
In [11]: [seen[c] for c in list]
Out[11]: [0, 1, 2, 0, 0, 3, 1]
It's kind of a trick but worth mentioning!
An alternative, suggested by @user2357112 in a related question/answer, is to increment with itertools.count
. This allows you to do this just in the constructor:
from itertools import count
seen = defaultdict(count().__next__) # .next in python 2
This may be preferable as the default_factory method won't look up seen
in global scope.
回答3:
>>> lst = ["a", "b", "c", "a", "a", "d", "b"]
>>> nums = [ord(x) for x in lst]
>>> print(nums)
[97, 98, 99, 97, 97, 100, 98]
回答4:
If you are not picky, then use the hash function: it returns an integer. For strings that are the same, it returns the same hash:
li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li) # Turn list of strings into list of ints
li = [hash(item) for item in li] # Same as above
回答5:
A functional approach:
l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter
mapped = itemgetter(*l)(dict(zip(l, count())))
You could also use a simple generator function:
from itertools import count
def uniq_ident(l):
cn,d = count(), {}
for ele in l:
if ele not in d:
c = next(cn)
d[ele] = c
yield c
else:
yield d[ele]
In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]
In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]