I come across a strange problem dealing with python isdigit function.
For example:
>>> a = u'\u2466'
>>> a.isdigit()
Out[1]: True
>>> a.isnumeric()
Out[2]: True
Why this character is a digit?
Any way to make this return False instead, thanks?
Edit, If I don't want to treat it as a digit, then how to filter it out?
For example, when I try to convert it to a int:
>>> int(u'\u2466')
Then UnicodeEncodeError
happened.
U+2466 is the CIRCLED DIGIT SEVEN (⑦), so yes, it's a digit.
If your definition of what is a digit differs from that of the Unicode Consortium, you might have to write your own isdigit()
method.
Edit, If I don't want to treat it as a digit, then how to filter it out?
If you are just interested in the ASCII digits 0
...9
, you could do something like:
In [4]: s = u'abc 12434 \u2466 5 def'
In [5]: u''.join(c for c in s if '0' <= c <= '9')
Out[5]: u'124345'
If you're going to convert something to int
you need isdecimal
rather than isdigit
.
Note that "decimal" is not just 0, 1, 2, ... 9, there are number of characters that can be interpreted as decimal digits and converted to an integer. Example:
#coding=utf8
s = u"1٢٣٤5"
print s.isdecimal() # True
print int(s) # 12345
The character is the CIRCLED DIGIT SEVEN
, which is numeric and a digit.
If you want to restrict the digits to the usual 0-9, use a regular expression:
import re
def myIsDigit(s):
return re.search("[^0-9]", s) is None