I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.
I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.
I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?
The basic streamTokenizer code at present is:
BufferedReader br = new BufferedReader(new StringReader(text));
StreamTokenizer st = new StreamTokenizer(br);
st.parseNumbers();
st.wordChars(44, 46); // ASCII comma, - , dot.
st.wordChars(48, 57); // ASCII 0 - 9.
st.wordChars(65, 90); // ASCII upper case A - Z.
st.wordChars(97, 122); // ASCII lower case a - z.
while (st.nextToken() != StreamTokenizer.TT_EOF) {
if (st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + st.sval);
}
else if (st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + st.nval);
}
}
br.close();
Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.
Thanks
Mr Morgan.
StreamTokenizer is outdated, is is better to use Scanner, this is sample code for your problem:
String s = "$23.24 word -123";
Scanner fi = new Scanner(s);
//anything other than alphanumberic characters,
//comma, dot or negative sign is skipped
fi.useDelimiter("[^\\p{Alnum},\\.-]");
while (true) {
if (fi.hasNextInt())
System.out.println("Int: " + fi.nextInt());
else if (fi.hasNextDouble())
System.out.println("Double: " + fi.nextDouble());
else if (fi.hasNext())
System.out.println("word: " + fi.next());
else
break;
}
If you want to use comma as a floating point delimiter, use fi.useLocale(Locale.FRANCE);
Try this:
String sanitizedText = text.replaceAll("[^\\w\\s\\.]", "");
SanitizedText will contain only alphanumerics and whitespace; tokenizing it after that should be a breeze.
EDIT
Edited to retain the decimal point as well (at the end of the bracket). .
is "special" to regexp so it needs a backslash escape.
This worked for me :
String onlyNumericText = text.replaceAll("\\\D", "");
String str = "1,222";
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++)
{
if(Character.isDigit(str.charAt(i)))
sb.append(str.charAt(i));
}
return sb.toString()
Sure this can be done with regexp:
s/[^\d\.]//g
However notice that it eats all commas, which is probably what you want if using american number format where comma is only separating thousands. In some languages comma is used instead of the point as a decimal separator. So take care when parsing international data.
I leave it on you to translate this to Java.
Code for get numbers from string.For example i have string "123" then i want to number 123.
int getNumber(String str){
int i=0;
int num=0;
int zeroAscii = (int)'0';
while (i<str.length()) {
int charAscii=(int)str.charAt(i);
num=num*10+(charAscii-zeroAscii);
i++;
}
return num;
}
Source : How to get number from string