I'm trying to check if a string is numeric or not, using the isnumeric
function, but the results are not as expected. The function works only if it's a unicode string.
>>> a=u'1'
>>> a.isnumeric()
True
>>> a='1'
>>> a.isnumeric()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'isnumeric'
isnumeric
works only if its unicode. Any reason why?
Often you will want to check if a string in Python is a number. This
happens all the time, for example with user input, fetching data from
a database (which may return a string), or reading a file containing
numbers. Depending on what type of number you are expecting, you can
use several methods. Such as parsing the string, using regex, or
simply attempting to cast (convert) it to a number and see what
happens. Often you will also encounter non-ASCII numbers, encoded in
Unicode. These may or may not be numbers. For example ๒, which is 2 in
Thai. However © is simply the copyright symbol, and is obviously not a
number.
link : http://pythoncentral.io/how-to-check-if-a-string-is-a-number-in-python-including-unicode/
According to the Python documentation, isnumeric
is only present for unicode objects:
The following methods are present only on unicode objects:
unicode.isnumeric()
Return True if there are only numeric characters in S, False otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH.
isnumeric()
has extended support for different numeral systems in Unicode strings.
In Americas and Europe the Hindu-Arabic numeral system is used which consists of 0123456789 digits.
The Hindu-Arabic numerals are also called European digits by the Unicode.
The are other numeral systems available such as:
- Roman numerals
- Ancient Greek numerals
- Tamil numerals
- Japaneese numerals
- Chineese numerals
- Korean numerals
More information about numeral systems can be found here: wikiwand.com/en/Numerals_in_Unicode#/Numerals_by_script
Unicode subscript
, superscript
and fractions
are also considered valid numerals by the isnumeric()
function.
You can use the isnumeric() function below to check if a string is a non-unicode number.
l = ['abc' + chr(255), 'abc', '123', '45a6', '78b', u"\u2155", '123.4', u'\u2161', u'\u2168']
def isnumeric(s):
'''Returns True for all non-unicode numbers'''
try:
s = s.decode('utf-8')
except:
return False
try:
float(s)
return True
except:
return False
for i in l:
print i, 'isnumeric:', isnumeric(i)
print '--------------------'
print u'\u2169', 'isnumeric', u'\u2169'.isnumeric()
print u'\u2165', 'isnumeric', u'\u2165'.isnumeric()
Edit: I'll update this post as soon as I have enough reputation to add more than 2 links to this answer.