The Hebrew language has unicode representation between 1424 and 1514 (or hex 0590 to 05EA).
I'm looking for the right, most efficient and most pythonic way to achieve this.
First I came up with this:
for c in s:
if ord(c) >= 1424 and ord(c) <= 1514:
return True
return False
Then I came with a more elegent implementation:
return any(map(lambda c: (ord(c) >= 1424 and ord(c) <= 1514), s))
And maybe:
return any([(ord(c) >= 1424 and ord(c) <= 1514) for c in s])
Which of these are the best? Or i should do it differently?
You could do:
# Python 3.
return any("\u0590" <= c <= "\u05EA" for c in s)
# Python 2.
return any(u"\u0590" <= c <= u"\u05EA" for c in s)
Your basic options are:
- Match against a regex containing the range of characters; or
- Iterate over the string, testing for membership of the character in a string or set containing all of your target characters, and break if you find a match.
Only actual testing can show which is going to be faster.
Its simple to check the first character with unidcodedata:
import unicodedata
def is_greek(term):
return 'GREEK' in unicodedata.name(term.strip()[0])
def is_hebrew(term):
return 'HEBREW' in unicodedata.name(term.strip()[0])