I'm using the method quoteChar('"')
to treat the strings.
The usual escape sequences such as "\n" and "\t" are recognized and converted to single characters as the string is parsed.
Is there any way to get the string just the way it is, meaning that if i have the string:
Hello\tworld
i want to get
Hello\tworld
and not:
Hello world
.
Thanks
Looking at the StreamTokenizer
source, it looks like the escape behavior for strings is hard-coded. I can only think of a few ways to get around it:
- Re-escape the string once you get it back. The problem here is that this won't match exactly what was in the file - \t will be converted back but \040 will not.
- Insert your own
Reader
in between the source Reader
and the StreamTokenizer
. Store all the chars read for the last token in a buffer. Trim whitespace from the start of that buffer to get the "raw" token.
- If your tokenizing rules are simple enough, implement your own tokenizer.
That what worked for me:
public class MyReader extends BufferedReader {
// You can choose whatever replacement you'd like(one wont occur in your text)
private static final char TAB_REPLACEMENT = '\u0000';
public MyReader(Reader in) {
super(in);
}
@Override
public int read() throws IOException {
int charVal = super.read();
if (charVal == '\t') {
return TAB_REPLACEMENT;
}
return charVal;
}
}
and then create the tokenizer by:
myTokenizer = new StreamTokenizer(new MyReader(new FileReader(file)));
and get the new strval by
MyTokenizer.sval.replace(TAB_REPLACEMENT, '\t')