Java - Removing the double quotes in XML attribute

2019-03-06 00:56发布

问题:

I have an xml string which I get via a REST call. However, some of the attributes have corrupted values. For example:

<property name="foo" value="Some corrupted String because of "something" like that"/>

How can I replace double-quotes either not preceded by value= or not follown by /> with a single quote and get a valid XML string out of that corrupted one in Java 6?

EDIT:

I have tried to modify this lookahead/lookbehind regex that was used for VisualBasic. But because of the incompatibility of escape characters I guess, I could not create the Java version of it. Here it is:

(?<=^[^""]*""(?>[^""]*""[^""]*"")*[^""]*)"(?! \s+ \w+=|\s* [/?]?" >)|(?<!\w+=)""(?=[^""]*""(?>[^""]*""[^""]*"")*[^""]*$)

回答1:

You can use the following regex:

\s+[\w:.-]+="([^"]*(?:"(?!\s+[\w:.-]+="|\s*(?:\/?|\?)>)[^"]*)*)"

See regex demo. It will match any attribute name/value pair capturing the latter into Group 1 that we can change inside a callback.

Here is a Java code demo:

String s =  "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <resources> <resource> <properties> <property name=\"name\" value=\"retrieveFoo\"/>\n<property name=\"foo\" value=\"Some corrupted String because of \"something\" like that\"/>";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(\\s+[\\w:.-]+=\")([^\"]*(?:\"(?!\\s+[\\w:.-]+=\"|\\s*(?:/?|\\?)>)[^\"]*)*)\"").matcher(s);
while (m.find()) {
    m.appendReplacement(result, m.group(1) + m.group(2).replace("\"", "&quot;") + "\"");
}
m.appendTail(result);
System.out.println(result.toString());

Output:

<?xml version="1.0" encoding="UTF-8"?> <resources> <resource> <properties> <property name="name" value="retrieveFoo"/> <property name="foo" value="Some corrupted String because of &quot;something&quot; like that"/>