I come across a strange problem dealing with python isdigit function.
For example:
>>> a = u'\u2466'
>>> a.isdigit()
Out[1]: True
>>> a.isnumeric()
Out[2]: True
Why this character is a digit?
Any way to make this return False instead, thanks?
Edit, If I don't want to treat it as a digit, then how to filter it out?
For example, when I try to convert it to a int:
>>> int(u'\u2466')
Then UnicodeEncodeError
happened.
If you're going to convert something to
int
you needisdecimal
rather thanisdigit
.Note that "decimal" is not just 0, 1, 2, ... 9, there are number of characters that can be interpreted as decimal digits and converted to an integer. Example:
The character is the
CIRCLED DIGIT SEVEN
, which is numeric and a digit.If you want to restrict the digits to the usual 0-9, use a regular expression:
U+2466 is the CIRCLED DIGIT SEVEN (⑦), so yes, it's a digit.
If your definition of what is a digit differs from that of the Unicode Consortium, you might have to write your own
isdigit()
method.If you are just interested in the ASCII digits
0
...9
, you could do something like: