I am having trouble parsing an XML with no closing tag. Please see snippet of the xml below.
I have tried SAX and also StAX Parser they both need a properly formatted XML with closing tag XXYY....as you can see below the XML format is a little bit different... Please help me if there is any API out there that can help me parse this or if SAX/StAX can help me achieve what I want.... :(
<Employees>
<Employee>
<Detail>
<Date>2018014
<Name>XXYY
<Age>0
<LANGUAGE>ENG
<Manager>
<MName>YYXX
<MID>5959
</Manager>
<EmployeeID>1234
</Detail>
</Employee>
</Employees>
You could "fix" the XML by adding all the missing end-tags.
Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.
The rule of "contains text" ensures that e.g. the <Manager>
tag doesn't get ended, since that is actually ended 3 lines down.
Example working code:
// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);
// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");
// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
TransformerFactory.newInstance().newTransformer()
.transform(new DOMSource(document), new StreamResult(System.out));
That appears to be SGML not XML. I've answered a newer question (for Javascript/node.js, but relevant to Java as well) detailing how to use the OpenSP SGML software to create XML from SGML.