java StreamTokenizer

2019-09-18 08:20发布

问题:

I'm using the method quoteChar('"') to treat the strings. The usual escape sequences such as "\n" and "\t" are recognized and converted to single characters as the string is parsed. Is there any way to get the string just the way it is, meaning that if i have the string:

Hello\tworld

i want to get

Hello\tworld

and not:

Hello world

. Thanks

回答1:

Looking at the StreamTokenizer source, it looks like the escape behavior for strings is hard-coded. I can only think of a few ways to get around it:

  1. Re-escape the string once you get it back. The problem here is that this won't match exactly what was in the file - \t will be converted back but \040 will not.
  2. Insert your own Reader in between the source Reader and the StreamTokenizer. Store all the chars read for the last token in a buffer. Trim whitespace from the start of that buffer to get the "raw" token.
  3. If your tokenizing rules are simple enough, implement your own tokenizer.


回答2:

That what worked for me:

public class MyReader extends BufferedReader {
    // You can choose whatever replacement you'd like(one wont occur in your text)
    private static final char TAB_REPLACEMENT = '\u0000';

    public MyReader(Reader in) {
        super(in);
    }

    @Override
    public int read() throws IOException {
        int charVal = super.read();
        if (charVal == '\t') {
            return TAB_REPLACEMENT;
        }
        return charVal;
    }
}

and then create the tokenizer by:

myTokenizer = new StreamTokenizer(new MyReader(new FileReader(file)));

and get the new strval by

MyTokenizer.sval.replace(TAB_REPLACEMENT, '\t')