Determine if a String is a number and convert in J

2020-05-17 04:52发布

I know variants of this question have been asked frequently before (see here and here for instance), but this is not an exact duplicate of those.

I would like to check if a String is a number, and if so I would like to store it as a double. There are several ways to do this, but all of them seem inappropriate for my purposes.

One solution would be to use Double.parseDouble(s) or similarly new BigDecimal(s). However, those solutions don't work if there are commas present (so "1,234" would cause an exception). I could of course strip out all commas before using these techniques, but that would seem to pose loads of problems in other locales.

I looked at Apache Commons NumberUtils.isNumber(s), but that suffers from the same comma issue.

I considered NumberFormat or DecimalFormat, but those seemed far too lenient. For instance, "1A" is formatted to "1" instead of indicating that it's not a number. Furthermore, something like "127.0.0.1" will be counted as the number 127 instead of indicating that it's not a number.

I feel like my requirements aren't so exotic that I'm the first to do this, but none of the solutions does exactly what I need. I suppose even I don't know exactly what I need (otherwise I could write my own parser), but I know the above solutions do not work for the reasons indicated. Does any solution exist, or do I need to figure out precisely what I need and write my own code for it?

15条回答
欢心
2楼-- · 2020-05-17 05:27

You can specify the Locale that you need:

NumberFormat nf = NumberFormat.getInstance(Locale.GERMAN);
double myNumber = nf.parse(myString).doubleValue();

This should work in your example since German Locale has commas as decimal separator.

查看更多
不美不萌又怎样
3楼-- · 2020-05-17 05:27

Unfortunately Double.parseDouble(s) or new BigDecimal(s) seem to be your best options.

You cite localisation concerns, but unfortunately there is no way reliably support all locales w/o specification by the user anyway. It is just impossible.

Sometimes you can reason about the scheme used by looking at whether commas or periods are used first, if both are used, but this isn't always possible, so why even try? Better to have a system which you know works reliably in certain situations than try to rely on one which may work in more situations but can also give bad results...

What does the number 123,456 represent? 123456 or 123.456?

Just strip commas, or spaces, or periods, depending on locale specified by user. Default to stripping spaces and commas. If you want to make it stricter, only strip commas OR spaces, not both, and only before the period if there is one. Also should be pretty easy to check manually if they are spaced properly in threes. In fact a custom parser might be easiest here.

Here is a bit of a proof of concept. It's a bit (very) messy but I reckon it works, and you get the idea anyways :).

public class StrictNumberParser {
  public double parse(String numberString) throws NumberFormatException {
    numberString = numberString.trim();
    char[] numberChars = numberString.toCharArray();

    Character separator = null;
    int separatorCount = 0;
    boolean noMoreSeparators = false;
    for (int index = 1; index < numberChars.length; index++) {
      char character = numberChars[index];

      if (noMoreSeparators || separatorCount < 3) {
        if (character == '.') {
          if (separator != null) {
            throw new NumberFormatException();
          } else {
            noMoreSeparators = true;
          }
        } else if (separator == null && (character == ',' || character == ' ')) {
          if (noMoreSeparators) {
            throw new NumberFormatException();
          }
          separator = new Character(character);
          separatorCount = -1;
        } else if (!Character.isDigit(character)) {
          throw new NumberFormatException();
        }

        separatorCount++;
      } else {
        if (character == '.') {
          noMoreSeparators = true;
        } else if (separator == null) {
          if (Character.isDigit(character)) {
            noMoreSeparators = true;
          } else if (character == ',' || character == ' ') {
            separator = new Character(character);
          } else {
            throw new NumberFormatException();
          }
        } else if (!separator.equals(character)) {
          throw new NumberFormatException();
        }

        separatorCount = 0;
      }
    }

    if (separator != null) {
      if (!noMoreSeparators && separatorCount != 3) {
        throw new NumberFormatException();
      }
      numberString = numberString.replaceAll(separator.toString(), "");
    }

    return Double.parseDouble(numberString);
  }

  public void testParse(String testString) {
    try {
      System.out.println("result: " + parse(testString));
    } catch (NumberFormatException e) {
      System.out.println("Couldn't parse number!");
    }
  }

  public static void main(String[] args) {
    StrictNumberParser p = new StrictNumberParser();
    p.testParse("123 45.6");
    p.testParse("123 4567.8");
    p.testParse("123 4567");
    p.testParse("12 45");
    p.testParse("123 456 45");
    p.testParse("345.562,346");
    p.testParse("123 456,789");
    p.testParse("123,456,789");
    p.testParse("123 456 789.52");
    p.testParse("23,456,789");
    p.testParse("3,456,789");
    p.testParse("123 456.12");
    p.testParse("1234567.8");
  }
}

EDIT: obviously this would need to be extended for recognising scientific notation, but this should be simple enough, especially as you don't have to actually validate anything after the e, you can just let parseDouble fail if it is badly formed.

Also might be a good idea to properly extend NumberFormat with this. have a getSeparator() for parsed numbers and a setSeparator for giving desired output format... This sort of takes care of localisation, but again more work would need to be done to support ',' for decimals...

查看更多
▲ chillily
4楼-- · 2020-05-17 05:31

You can use the ParsePosition as a check for complete consumption of the string in a NumberFormat.parse operation. If the string is consumed, then you don't have a "1A" situation. If not, you do and can behave accordingly. See here for a quick outline of the solution and here for the related JDK bug that is closed as wont fix because of the ParsePosition option.

查看更多
【Aperson】
5楼-- · 2020-05-17 05:31

Not sure if it meets all your requirements, but the code found here might point you in the right direction?

From the article:

To summarize, the steps for proper input processing are:

  1. Get an appropriate NumberFormat and define a ParsePosition variable.
  2. Set the ParsePosition index to zero.
  3. Parse the input value with parse(String source, ParsePosition parsePosition).
  4. Perform error operations if the input length and ParsePosition index value don't match or if the parsed Number is null.
  5. Otherwise, the value passed validation.
查看更多
做个烂人
6楼-- · 2020-05-17 05:31

This code should handle most inputs, except IP addresses where all groups of digits are in three's (ex: 255.255.255.255 is valid, but not 255.1.255.255). It also doesn't support scientific notation

It will work with most variants of separators (",", "." or space). If more than one separator is detected, the first is assumed to be the thousands separator, with additional checks (validity etc.)

Edit: prevDigit is used for checking that the number uses thousand separators correctly. If there are more than one group of thousands, all but the first one must be in groups of 3. I modified the code to make it clearer so that "3" is not a magic number but a constant.

Edit 2: I don't mind the down votes much, but can someone explain what the problem is?

/* A number using thousand separator must have
   groups of 3 digits, except the first one.
   Numbers following the decimal separator can
   of course be unlimited. */
private final static int GROUP_SIZE=3;

public static boolean isNumber(String input) {
    boolean inThousandSep = false;
    boolean inDecimalSep = false;
    boolean endsWithDigit = false;
    char thousandSep = '\0';
    int prevDigits = 0;

    for(int i=0; i < input.length(); i++) {
        char c = input.charAt(i);

        switch(c) {
            case ',':
            case '.':
            case ' ':
                endsWithDigit = false;
                if(inDecimalSep)
                    return false;
                else if(inThousandSep) {
                    if(c != thousandSep)
                        inDecimalSep = true;
                    if(prevDigits != GROUP_SIZE)
                        return false; // Invalid use of separator
                }
                else {
                    if(prevDigits > GROUP_SIZE || prevDigits == 0)
                        return false;
                    thousandSep = c;
                    inThousandSep = true;
                }
                prevDigits = 0;
                break;

            default:
                if(Character.isDigit(c)) {
                    prevDigits++;
                    endsWithDigit = true;
                }
                else {
                    return false;
                }
        }
    }
    return endsWithDigit;
}

Test code:

public static void main(String[] args) {
    System.out.println(isNumber("100"));               // true
    System.out.println(isNumber("100.00"));            // true
    System.out.println(isNumber("1,5"));               // true
    System.out.println(isNumber("1,000,000.00."));     // false
    System.out.println(isNumber("100,00,2"));          // false
    System.out.println(isNumber("123.123.23.123"));    // false
    System.out.println(isNumber("123.123.123.123"));   // true       
}
查看更多
Animai°情兽
7楼-- · 2020-05-17 05:33

This will take a string, count its decimals and commas, remove commas, conserve a valid decimal (note that this is based on US standardization - in order to handle 1.000.000,00 as 1 million this process would have to have the decimal and comma handling switched), determine if the structure is valid, and then return a double. Returns null if the string could not be converted. Edit: Added support for international or US. convertStoD(string,true) for US, convertStoD(string,false) for non US. Comments are now for US version.

public double convertStoD(string s,bool isUS){
 //string s = "some string or number, something dynamic";
 bool isNegative = false;
 if(s.charAt(0)== '-')
 {
  s = s.subString(1);
  isNegative = true;
 }
 string ValidNumberArguements = new string();
 if(isUS)
 {
   ValidNumberArguements = ",.";
 }else{
   ValidNumberArguements = ".,";
 }
 int length = s.length;
 int currentCommas = 0;
 int currentDecimals = 0;
 for(int i = 0; i < length; i++){
  if(s.charAt(i) == ValidNumberArguements.charAt(0))//charAt(0) = ,
  {
   currentCommas++;
   continue;
  }
  if(s.charAt(i) == ValidNumberArguements.charAt(1))//charAt(1) = .
  {
   currentDec++;
   continue;
  }
  if(s.charAt(i).matches("\D"))return null;//remove 1 A
 }
 if(currentDecimals > 1)return null;//remove 1.00.00
 string decimalValue = "";
 if(currentDecimals > 0)
 {
   int index = s.indexOf(ValidNumberArguements.charAt(1));
   decimalValue += s.substring(index);
   s = s.substring(0,index);
   if(decimalValue.indexOf(ValidNumberArguements.charAt(0)) != -1)return null;//remove 1.00,000
 }
 int allowedCommas = (s.length-1) / 3;
 if(currentCommas > allowedCommas)return null;//remove 10,00,000
 String[] NumberParser = s.split(ValidNumberArguements.charAt(0));
 length = NumberParser.length;
 StringBuilder returnString = new StringBuilder();
 for(int i = 0; i < length; i++)
 {
   if(i == 0)
   {
     if(NumberParser[i].length > 3 && length > 1)return null;//remove 1234,0,000
     returnString.append(NumberParser[i]);
     continue;
   }
   if(NumberParser[i].length != 3)return null;//ensure proper 1,000,000
   returnString.append(NumberParser[i]);
 }
 returnString.append(decimalValue);
 double answer = Double.parseDouble(returnString);
 if(isNegative)answer *= -1;
 return answer;
}
查看更多
登录 后发表回答