Match accentuated strings in lists of string in Py

2019-05-26 17:55发布

问题:

Why does this return False in Python 3? And what is a way to make it return True?

e = "allé.png"
l = ["allé.png"]

print(e in l)

回答1:

When comparing unicode you should normalize your data using unicodedata. If you intend to search in a large list you could use map or list comprehension:

import unicodedata
from functools import partial

normalize = partial(unicodedata.normalize, 'NFC')

e = "allé.png"
e = normalize(e)
l = ["allé.png"]
print(e in map(normalize, l))

Output

True

Or as an alternative:

print(e in [normalize(s) for s in l])

Further

  1. What does unicodedata.normalize do in python?
  2. Normalizing Unicode