Can iText 5 convert PDF to HTML?

2019-01-28 21:02发布

问题:

I used iText 5 to create a nice looking report which includes some tables and graphs. I wonder if iText lets you convert PDF to HTML and if so .. how can one do it?

I believe previous versions of iText allowed it, but in iText 5 i was not able to find a way to do this.

回答1:

No. iText has never converted PDF to HTML, only the reverse.



回答2:

Have you had a look at http://www.jpedal.org/pdf_to_html_conversion.php - there is currently a free beta.



回答3:

Possible to do with Apache Tika (it uses Apache PDFBox under the hood):

public String pdfToHtml(InputStream content) {
    PDDocument pddDocument = PDDocument.load(content);
    PDFText2HTML stripper = new PDFText2HTML("UTF-8");
    return stripper.getText(pddDocument);
}


标签: html pdf itext