How to do HTML to XML conversion to generate close

2019-01-15 23:57发布

问题:

How to do xml to html conversion to generate closed tags.

The context is explained here: Error while generating pdf from Html file in Java using iText

When I try converting html to pdf using iText and XML Worker, I'm asked to give the closing tag for <hr> and <br> tags. It works if I do this manually: conversion to pdf worked! But I don't want to add each closing tag manually. How can I do this in an automated way?

回答1:

You are experiencing this problem because you are feeding HTML to iText's XML Worker. XML Worker requires XML, so you need to convert your HTML into XHTML.

There is an example on how to do this on the official iText site: D00_XHTML

public static void tidyUp(String path) throws IOException {
    File html = new File(path);
    byte[] xhtml = Jsoup.parse(html, "US-ASCII").html().getBytes();
    File dir = new File("results/xml");
    dir.mkdirs();
    FileOutputStream fos = new FileOutputStream(new File(dir, html.getName()));
    fos.write(xhtml);
    fos.close();
}

In this example, we get a path to an ordinary HTML file (similar to what you have). We then use the Jsoup library to parse the HTML into an XHTML byte array. In this example, we use that byte array to write an XHTML file to disk. You can use the byte array directly as input for XML Worker.



标签: html xml itext