How do you set the delimiter for a scanner to either ; or new line?
I tried:
Scanner.useDelimiter(Pattern.compile("(\n)|;"));
But it doesn't work.
How do you set the delimiter for a scanner to either ; or new line?
I tried:
Scanner.useDelimiter(Pattern.compile("(\n)|;"));
But it doesn't work.
As a general rule, in patterns, you need to double the \
.
So, try
Scanner.useDelimiter(Pattern.compile("(\\n)|;"));`
or
Scanner.useDelimiter(Pattern.compile("[\\n;]"));`
Edit: If \r\n
is the problem, you might want to try this:
Scanner.useDelimiter(Pattern.compile("[\\r\\n;]+"));
which matches one or more of \r
, \n
, and ;
.
Note: I haven't tried these.
As you've discovered, you needed to look for DOS/network style \r\n
(CRLF) line separators instead of the Unix style \n
(LF only). But what if the text contains both? That happens a lot; in fact, when I view the source of this very page I see both varieties.
You should get in the habit of looking for both kinds of separator, as well as the older Mac style \r
(CR only). Here's one way to do that:
\r?\n|\r
Plugging that into your sample code you get:
scanner.useDelimiter(";|\r?\n|\r");
This is assuming you want to match exactly one newline or semicolon at a time. If you want to match one or more you can do this instead:
scanner.useDelimiter("[;\r\n]+");
Notice, too, how I passed in a regex string instead of a Pattern; all regexes get cached automatically, so pre-compiling the regex doesn't get you any performance gain.
Looking at the OP's comment, it looks like it was a different line ending (\r\n or CRLF) that was the problem.
Here's my answer, which would handle multiple semicolons and line endings in either format (may or may not be desired)
Scanner.useDelimiter(Pattern.compile("([\n;]|(\r\n))+"));
e.g. an input file that looks like this:
1
2;3;;4
5
would result in 1,2,3,4,5
I tried normal \n and \\n - both worked in my case, though I agree if you need a normal backslash you would want to double it as it is an escape character. It just so happens that in this case, "\n" becomes the desired character with or without the extra '\'