I know variants of this question have been asked frequently before (see here and here for instance), but this is not an exact duplicate of those.
I would like to check if a String
is a number, and if so I would like to store it as a double
. There are several ways to do this, but all of them seem inappropriate for my purposes.
One solution would be to use Double.parseDouble(s)
or similarly new BigDecimal(s)
. However, those solutions don't work if there are commas present (so "1,234" would cause an exception). I could of course strip out all commas before using these techniques, but that would seem to pose loads of problems in other locales.
I looked at Apache Commons NumberUtils.isNumber(s)
, but that suffers from the same comma issue.
I considered NumberFormat
or DecimalFormat
, but those seemed far too lenient. For instance, "1A" is formatted to "1" instead of indicating that it's not a number. Furthermore, something like "127.0.0.1" will be counted as the number 127 instead of indicating that it's not a number.
I feel like my requirements aren't so exotic that I'm the first to do this, but none of the solutions does exactly what I need. I suppose even I don't know exactly what I need (otherwise I could write my own parser), but I know the above solutions do not work for the reasons indicated. Does any solution exist, or do I need to figure out precisely what I need and write my own code for it?
You can specify the Locale that you need:
This should work in your example since German Locale has commas as decimal separator.
Unfortunately Double.parseDouble(s) or new BigDecimal(s) seem to be your best options.
You cite localisation concerns, but unfortunately there is no way reliably support all locales w/o specification by the user anyway. It is just impossible.
Sometimes you can reason about the scheme used by looking at whether commas or periods are used first, if both are used, but this isn't always possible, so why even try? Better to have a system which you know works reliably in certain situations than try to rely on one which may work in more situations but can also give bad results...
What does the number 123,456 represent? 123456 or 123.456?
Just strip commas, or spaces, or periods, depending on locale specified by user. Default to stripping spaces and commas. If you want to make it stricter, only strip commas OR spaces, not both, and only before the period if there is one. Also should be pretty easy to check manually if they are spaced properly in threes. In fact a custom parser might be easiest here.
Here is a bit of a proof of concept. It's a bit (very) messy but I reckon it works, and you get the idea anyways :).
EDIT: obviously this would need to be extended for recognising scientific notation, but this should be simple enough, especially as you don't have to actually validate anything after the e, you can just let parseDouble fail if it is badly formed.
Also might be a good idea to properly extend NumberFormat with this. have a getSeparator() for parsed numbers and a setSeparator for giving desired output format... This sort of takes care of localisation, but again more work would need to be done to support ',' for decimals...
You can use the ParsePosition as a check for complete consumption of the string in a NumberFormat.parse operation. If the string is consumed, then you don't have a "1A" situation. If not, you do and can behave accordingly. See here for a quick outline of the solution and here for the related JDK bug that is closed as wont fix because of the ParsePosition option.
Not sure if it meets all your requirements, but the code found here might point you in the right direction?
From the article:
This code should handle most inputs, except IP addresses where all groups of digits are in three's (ex: 255.255.255.255 is valid, but not 255.1.255.255). It also doesn't support scientific notation
It will work with most variants of separators (",", "." or space). If more than one separator is detected, the first is assumed to be the thousands separator, with additional checks (validity etc.)
Edit: prevDigit is used for checking that the number uses thousand separators correctly. If there are more than one group of thousands, all but the first one must be in groups of 3. I modified the code to make it clearer so that "3" is not a magic number but a constant.
Edit 2: I don't mind the down votes much, but can someone explain what the problem is?
Test code:
This will take a string, count its decimals and commas, remove commas, conserve a valid decimal (note that this is based on US standardization - in order to handle 1.000.000,00 as 1 million this process would have to have the decimal and comma handling switched), determine if the structure is valid, and then return a double. Returns null if the string could not be converted. Edit: Added support for international or US. convertStoD(string,true) for US, convertStoD(string,false) for non US. Comments are now for US version.