I have an xml string which I get via a REST call. However, some of the attributes have corrupted values. For example:
<property name="foo" value="Some corrupted String because of "something" like that"/>
How can I replace double-quotes either not preceded by value= or not follown by /> with a single quote and get a valid XML string out of that corrupted one in Java 6?
EDIT:
I have tried to modify this lookahead/lookbehind regex that was used for VisualBasic. But because of the incompatibility of escape characters I guess, I could not create the Java version of it. Here it is:
(?<=^[^""]*""(?>[^""]*""[^""]*"")*[^""]*)"(?! \s+ \w+=|\s* [/?]?" >)|(?<!\w+=)""(?=[^""]*""(?>[^""]*""[^""]*"")*[^""]*$)
You can use the following regex:
\s+[\w:.-]+="([^"]*(?:"(?!\s+[\w:.-]+="|\s*(?:\/?|\?)>)[^"]*)*)"
See regex demo. It will match any attribute name/value pair capturing the latter into Group 1 that we can change inside a callback.
Here is a Java code demo:
String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <resources> <resource> <properties> <property name=\"name\" value=\"retrieveFoo\"/>\n<property name=\"foo\" value=\"Some corrupted String because of \"something\" like that\"/>";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(\\s+[\\w:.-]+=\")([^\"]*(?:\"(?!\\s+[\\w:.-]+=\"|\\s*(?:/?|\\?)>)[^\"]*)*)\"").matcher(s);
while (m.find()) {
m.appendReplacement(result, m.group(1) + m.group(2).replace("\"", """) + "\"");
}
m.appendTail(result);
System.out.println(result.toString());
Output:
<?xml version="1.0" encoding="UTF-8"?> <resources> <resource> <properties> <property name="name" value="retrieveFoo"/>
<property name="foo" value="Some corrupted String because of "something" like that"/>