I have a list of non-unique strings:
list = ["a", "b", "c", "a", "a", "d", "b"]
I would like to replace each element with an integer key which uniquely identifies each string:
list = [0, 1, 2, 0, 0, 3, 1]
The number does not matter, as long as it is a unique identifier.
So far all I can think to do is copy the list to a set, and use the index of the set to reference the list. I'm sure there's a better way though.
A functional approach:
You could also use a simple generator function:
This will guarantee uniqueness and that the id's are contiguous starting from
0
:On a different note, you should not use
'list'
as variable name because it will shadow the built-in typelist
.Here's a single pass solution with defaultdict:
It's kind of a trick but worth mentioning!
An alternative, suggested by @user2357112 in a related question/answer, is to increment with
itertools.count
. This allows you to do this just in the constructor:This may be preferable as the default_factory method won't look up
seen
in global scope.If you are not picky, then use the hash function: it returns an integer. For strings that are the same, it returns the same hash: