I load xml file into DOM model and analyze it.
The code for that is:
public class MyTest {
public static void main(String[] args) {
Document doc = XMLUtils.fileToDom("MyTest.xml");//Loads xml data to DOM
Element rootElement = doc.getDocumentElement();
NodeList nodes = rootElement.getChildNodes();
Node child1 = nodes.item(1);
Node child2 = nodes.item(3);
String str1 = child1.getTextContent();
String str2 = child2.getTextContent();
if(str1 != null){
System.out.println(str1.equals(str2));
}
System.out.println();
System.out.println(str1);
System.out.println(str2);
}
}
MyTest.xml
<tests>
<test name="1">ff1 "</test>
<test name="2">ff1 "</test>
</tests>
Result:
true
ff1 "
ff1 "
Desired result:
false
ff1 "
ff1 "
So I need to distinguish these two cases: when the quote is escaped and is not.
Please help.
Thank you in advance.
P.S. The code for XMLUtils#fileToDom(String filePath), a snippet from XMLUtils class:
static {
DocumentBuilderFactory dFactory = DocumentBuilderFactory.newInstance();
dFactory.setNamespaceAware(false);
dFactory.setValidating(false);
try {
docNonValidatingBuilder = dFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
}
}
public static DocumentBuilder getNonValidatingBuilder() {
return docNonValidatingBuilder;
}
public static Document fileToDom(String filePath) {
Document doc = getNonValidatingBuilder().newDocument();
File f = new File(filePath);
if(!f.exists())
return doc;
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMResult result = new DOMResult(doc);
StreamSource source = new StreamSource(f);
transformer.transform(source, result);
} catch (Exception e) {
return doc;
}
return doc;
}
I've take a look on source code of apache xerces and propose my solution (but it is monkey patch).
I've wrote simple class
package a;
import java.io.IOException;
import org.apache.xerces.impl.XMLDocumentScannerImpl;
import org.apache.xerces.parsers.NonValidatingConfiguration;
import org.apache.xerces.xni.XMLString;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLComponent;
public class MyConfig extends NonValidatingConfiguration {
private MyScanner myScanner;
@Override
@SuppressWarnings("unchecked")
protected void configurePipeline() {
if (myScanner == null) {
myScanner = new MyScanner();
addComponent((XMLComponent) myScanner);
}
super.fProperties.put(DOCUMENT_SCANNER, myScanner);
super.fScanner = myScanner;
super.fScanner.setDocumentHandler(this.fDocumentHandler);
super.fLastComponent = fScanner;
}
private static class MyScanner extends XMLDocumentScannerImpl {
@Override
protected void scanEntityReference() throws IOException, XNIException {
// name
String name = super.fEntityScanner.scanName();
if (name == null) {
reportFatalError("NameRequiredInReference", null);
return;
}
super.fDocumentHandler.characters(new XMLString(("&" + name + ";")
.toCharArray(), 0, name.length() + 2), null);
// end
if (!super.fEntityScanner.skipChar(';')) {
reportFatalError("SemicolonRequiredInReference",
new Object[] { name });
}
fMarkupDepth--;
}
}
}
You need to add only next line to your main method before start parsing
System.setProperty(
"org.apache.xerces.xni.parser.XMLParserConfiguration",
"a.MyConfig");
And you will have expected result:
false
ff1 "
ff1 "
Looks like you can get the TEXT_NODE child and use getNodeValue
(assuming it's not NULL):
public static String getRawContent(Node n) {
if (n == null) {
return null;
}
Node n1 = getChild(n, Node.TEXT_NODE);
if (n1 == null) {
return null;
}
return n1.getNodeValue();
}
Grabbed that from:
http://www.java2s.com/Code/Java/XML/Gettherawtextcontentofanodeornullifthereisnotext.htm
There is no way to do this for the internal entities. XML does not support this concept. Internal entities are just a different way to write the same PSVI content into the text, they are not distinctive.