What is the best way to split a string, when the p

2019-07-30 14:58发布

问题:

In Java, what is the best way to split a string into an array of blocks, when the delimiters at the beginning of each block are different from the delimiters at the end of each block?

For example, suppose I have String string = "abc 1234 xyz abc 5678 xyz".

I want to apply some sort of complex split in order to obtain {"1234","5678"}.

The first thing that comes to mind is:

String[] parts = string.split("abc");
for (String part : parts)
{
    String[] blocks = part.split("xyz");
    String data = blocks[0];
    // Do some stuff with the 'data' string
}

Is there a simpler / cleaner / more efficient way of doing it?

My purpose (as you've probably guessed) is to parse an XML document.

I want to split a given XML string into the Inner-XML blocks of a given tag.

For example:

String xml = "<tag>ABC</tag>White Spaces Only<tag>XYZ</tag>";
String[] blocks = Split(xml,"<tag>","</tag>"); // should be {"ABC","XYZ"}

How would you implement String[] Split(String str,String prefix,String suffix)?

Thanks

回答1:

The best is to use one of the dedicated XML parsers. See this discussion about best XML parser for Java.

I found this DOM XML parser example as a simple and good one.



回答2:

IMHO the best solution will be to parse the XML file, which is not a one line thing...

Look here

Here you have sample code from another question on SO to parse the document and then move around with XPATH:

String xml = "<resp><status>good</status><msg>hi</msg></resp>";

InputSource source = new InputSource(new StringReader(xml));

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();

String msg = xpath.evaluate("/resp/msg", document);
String status = xpath.evaluate("/resp/status", document);

System.out.println("msg=" + msg + ";" + "status=" + status);

Complete thread of this post here



回答3:

You can write a regular expression for this type of string…

How about something like \s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s* which says abc at the beginning, or xyz at the end, or abc xyz in the middle (modulo some spaces)? This produces an empty value at the beginning, but aside from that, it seems like it'd do what you want.

import java.util.Arrays;

public class RegexDelimitersExample {
    public static void main(String[] args) {
        final String string = "abc 1234 xyz abc 5678 xyz";
        final String pattern = "\\s*((^abc)|(xyz\\s*abc)|(\\s*xyz$))\\s*";
        final String[] parts_ = string.split( pattern );
        // parts_[0] is "", because there's nothing before ^abc,
        // so a copy of the rest of the array is what we want.
        final String[] parts = Arrays.copyOfRange( parts_, 1, parts_.length );
        System.out.println( Arrays.deepToString( parts ));
    }
}
[1234, 5678]

Depending on how you want to handle spaces, you could adjust this as necessary. E.g.,

\s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s*     # original
(^abc\s*)|(\s*xyz\s*abc\s*)|(\s*xyz$)    # no spaces on outside
...                                      # ...

…but you shouldn't use it for XML.

As I noted in the comments, though, this will work for splitting a non-nested string that has these sorts of delimiters. You won't be able to handle nested cases (e.g., abc abc 12345 xyz xyz) using regular expressions, so you won't be able to handle general XML (which seemed to be your intent). If you actually need to parse XML, use a tool designed for XML (e.g., a parser, an XPath query, etc.).



回答4:

Don't use regexes here. But you don't have to do full-fledged XML parsing either. Use XPath. The expression to search for in your example would be

//tag/text()

The code needed is:

import org.w3c.dom.NodeList;
import org.xml.sax.*;
import javax.xml.xpath.*;

public class Test {

    public static void main(String[] args) throws Exception {

        InputSource ins = new InputSource("c:/users/ndh/hellos.xml");
        XPath xpath = XPathFactory.newInstance().newXPath();
        NodeList list = (NodeList)xpath.evaluate("//bar/text()", ins, XPathConstants.NODESET);
        for (int i = 0; i < list.getLength(); i++) {
            System.out.println(list.item(i).getNodeValue());
        }

    }
}

where my example xml file is

<?xml version="1.0"?>
<foo>
    <bar>hello</bar>
    <bar>ohayoo</bar>
    <bar>hola</bar>
</foo>

This is the most declarative way to do it.