Is there any way improve the performance of Flying

2019-01-23 21:59发布

I've followed this article to use FlyingSaucer to convert XHTML to PDF and it's brilliant but has one major downfall... it's ridiculously slow!

I'm finding that it takes between 1 and 2 minutes to render a PDF from an XHTML, regardless of how simple that page is.

Basic code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.DocumentException;

public class FirstDoc {

    public static void main(String[] args) throws IOException, DocumentException {

        String inputFile = "firstdoc.xhtml";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "firstdoc.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);

        os.close();
    }
}

Sample XHTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>My First Document</title>
        <style type="text/css"> b { color: green; } </style>
    </head>
    <body>
        <p>
            <b>Greetings Earthlings!</b>
            We've come for your Java.
        </p>
    </body>
</html>

Does anyone know how to improve the performance of FlyingSaucer?

Failing that, is anyone able to recommend an alternative Java library which is effective at rendering a PDF from a URL to an (X)HTML document with external CSS and images generated from URLs?

4条回答
Rolldiameter
2楼-- · 2019-01-23 22:19

I was facing the same problem as Edd.

Sadly the next approach didn't work Java DocumentBuilder: xml parsing is very slow? by Marek Piechut completely for me - my HTML entities got lost on the way.

DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
fac.setNamespaceAware(false);
fac.setValidating(false);
fac.setFeature("http://xml.org/sax/features/namespaces", false);
fac.setFeature("http://xml.org/sax/features/validation", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = fac.newDocumentBuilder();

What finally did the trick were these lines:

DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());

By using the built-in Java EntityResolver for resolving the DTD it got faster tremendously.

查看更多
在下西门庆
3楼-- · 2019-01-23 22:21

I would make 2 recommendations:

  1. Profile it.

  2. Wrap the OutputStream in a BufferedOutputStream

  3. Profile it. (Oops ... I'm repeating myself. Well, you get the picture.)

查看更多
霸刀☆藐视天下
4楼-- · 2019-01-23 22:39

The problem is, that you are probably using this code from the linked article:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));

This way the builder will try to load the the referenced DTD.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Loading and parsing the DTD takes a lot of time.

If you are using

ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url); // not setDocument(document)

the DTD won't be resolved by Flying Saucer. If you want to load a Document, not set an url, see

查看更多
We Are One
5楼-- · 2019-01-23 22:41

Let me start by saying that I used your sample code and sample xhtml, and it "Ran in 2675ms".

I downloaded flyingsaucer R8. And put three of the jars into my classpath.

core-renderer.jar, iText-2.0.8.jar, xml-apis-xerces-2.9.1.jar

I measured the run time by modifying your code with instrumentation...

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.DocumentException;

public class FirstDoc {

    public static void main(String[] args) throws IOException, DocumentException {
        long start = System.currentTimeMillis();
        String inputFile = "firstdoc.xhtml";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "firstdoc.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);

        os.close();
        long end = System.currentTimeMillis();
        System.out.println("Ran in " + (end-start) + "ms");
    }
}

Now this library isn't exactly speedy, but it doesn't seem to be taking 1-2 minutes either. So now we need to figure out why it's running so slowly for you. Could you please let us know which JDK your using and on what platform? Also which version of flyingsaucer are you using?

查看更多
登录 后发表回答