In Java, what is the best way to split a string into an array of blocks, when the delimiters at the beginning of each block are different from the delimiters at the end of each block?
For example, suppose I have String string = "abc 1234 xyz abc 5678 xyz"
.
I want to apply some sort of complex split
in order to obtain {"1234","5678"}
.
The first thing that comes to mind is:
String[] parts = string.split("abc");
for (String part : parts)
{
String[] blocks = part.split("xyz");
String data = blocks[0];
// Do some stuff with the 'data' string
}
Is there a simpler / cleaner / more efficient way of doing it?
My purpose (as you've probably guessed) is to parse an XML document.
I want to split a given XML string into the Inner-XML blocks of a given tag.
For example:
String xml = "<tag>ABC</tag>White Spaces Only<tag>XYZ</tag>";
String[] blocks = Split(xml,"<tag>","</tag>"); // should be {"ABC","XYZ"}
How would you implement String[] Split(String str,String prefix,String suffix)
?
Thanks
You can write a regular expression for this type of string…
How about something like
\s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s*
which saysabc
at the beginning, orxyz
at the end, orabc xyz
in the middle (modulo some spaces)? This produces an empty value at the beginning, but aside from that, it seems like it'd do what you want.Depending on how you want to handle spaces, you could adjust this as necessary. E.g.,
…but you shouldn't use it for XML.
As I noted in the comments, though, this will work for splitting a non-nested string that has these sorts of delimiters. You won't be able to handle nested cases (e.g.,
abc abc 12345 xyz xyz
) using regular expressions, so you won't be able to handle general XML (which seemed to be your intent). If you actually need to parse XML, use a tool designed for XML (e.g., a parser, an XPath query, etc.).The best is to use one of the dedicated XML parsers. See this discussion about best XML parser for Java.
I found this DOM XML parser example as a simple and good one.
IMHO the best solution will be to parse the XML file, which is not a one line thing...
Look here
Here you have sample code from another question on SO to parse the document and then move around with XPATH:
Complete thread of this post here
Don't use regexes here. But you don't have to do full-fledged XML parsing either. Use XPath. The expression to search for in your example would be
The code needed is:
where my example xml file is
This is the most declarative way to do it.