I have a multiline string which is delimited by a set of different delimiters:
(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
I can split this string into its parts, using String.split
, but it seems that I can't get the actual string, which matched the delimiter regex.
In other words, this is what I get:
Text1
Text2
Text3
Text4
This is what I want
Text1
DelimiterA
Text2
DelimiterC
Text3
DelimiterB
Text4
Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?
Tweaked Pattern.split() to include matched pattern to the list
Added
Full source
I will post my working versions also(first is really similar to Markus).
And here is second solution and its round 50% faster than first one:
Here is a simple clean implementation which is consistent with
Pattern#split
and works with variable length patterns, which look behind cannot support, and it is easier to use. It is similar to the solution provided by @cletus.I don't do null checks here,
Pattern#split
doesn't, why should I. I don't like theif
at the end but it is required for consistency with thePattern#split
. Otherwise I would unconditionally append, resulting in an empty string as the last element of the result if the input string ends with the pattern.I convert to String[] for consistency with
Pattern#split
, I usenew String[0]
rather thannew String[result.size()]
, see here for why.Here are my tests:
I like the idea of StringTokenizer because it is Enumerable.
But it is also obsolete, and replace by String.split which return a boring String[] (and does not includes the delimiters).
So I implemented a StringTokenizerEx which is an Iterable, and which takes a true regexp to split a string.
A true regexp means it is not a 'Character sequence' repeated to form the delimiter:
'o' will only match 'o', and split 'ooo' into three delimiter, with two empty string inside:
But the regexp o+ will return the expected result when splitting "aooob"
To use this StringTokenizerEx:
The code of this class is available at DZone Snippets.
As usual for a code-challenge response (one self-contained class with test cases included), copy-paste it (in a 'src/test' directory) and run it. Its main() method illustrates the different usages.
Note: (late 2009 edit)
The article Final Thoughts: Java Puzzler: Splitting Hairs does a good work explaning the bizarre behavior in
String.split()
.Josh Bloch even commented in response to that article:
The Google common-library Guava contains also a Splitter which is:
So it may worth being checked out. From their initial rough documentation (pdf):
Pass the 3rd aurgument as "true". It will return delimiters as well.
I don't know of an existing function in the Java API that does this (which is not to say it doesn't exist), but here's my own implementation (one or more delimiters will be returned as a single token; if you want each delimiter to be returned as a separate token, it will need a bit of adaptation):