I am trying to convert HTML(with external CSS) into PDF using Itext XMLWorkerHelper, am facing the run-time exception whenever XMLWorkerHelper parses a malformed HTML. For example:
The html below has input tag not closed : and XMLWorkerHelper cannot parse and throws run-time exception.
if i try with proper HTML input tag enclosed,it works fine.
How can i convert malformed or complex HTML (along with css) to PDF using Itext.
below is my code:
var test_html = File.ReadAllText("C:/Desking _ Lender Program - Dealertrack.html");
var test_css = File.ReadAllText("C:/login.css");
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(test_css)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(test_html)))
{
//Parse the HTML
try
{
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
}
catch { }
}
}
It's a bit unclear whether you've decided to use iText7 or iTextSharp (5.x.x), but here's a simple example of the latter using HtmlAgilityPack to clean up malformed HTML:
PDF output:
And if you are free to choose your particular iText flavour, please go with iText7 and pdfHTML. It supercedes XMLWorker, supports a wider range of tags and CSS3.0.