Determine the max resolution (DPI) on a PDF page

2019-07-24 13:33发布

问题:

I am using GhostScript.Net to rasterize PDF to page images before sending the page images to the printer. I am doing this so that I can always rasterize to 300dpi. This allows me to print the PDF in a reasonable amount of time regardless of the size of any image in the PDF (mainly scanned PDFs).

However, it strikes me that in some cases there will not be a need to rasterize as high as 300dpi. It may be possible to rasterize to 200dpi or even 100dpi depending on the content of the page.

Has anyone attempted to determine the maximum DPI for the content of a PDF page? Perhaps using iTextSharp?

My current code is this:

        var dpiList = new List<int> {50, 100, 150, 200, 250, 300, 350, 400, 450, 500};

        string inputPdfPath = @"C:\10page.pdf";
        string outputPath = @"C:\Print\";

        var lastInstalledVersion =
            GhostscriptVersionInfo.GetLastInstalledVersion(
                    GhostscriptLicense.GPL | GhostscriptLicense.AFPL,
                    GhostscriptLicense.GPL);

        var rasterizer = new GhostscriptRasterizer();

        rasterizer.Open(inputPdfPath, lastInstalledVersion, true);

        var imageFiles = new List<string>();

        for (int pageNumber = 1; pageNumber <= 10; pageNumber++)
        {
            foreach (var dpi in dpiList)
            {
                string pageFilePath = System.IO.Path.Combine(outputPath,
                    string.Format("{0}-{1}-{2}.png", pageNumber, Guid.NewGuid().ToString("N").Substring(0, 8), dpi));

                System.Drawing.Image img = rasterizer.GetPage(dpi, dpi, pageNumber);
                img.Save(pageFilePath, ImageFormat.Png);
                imageFiles.Add(pageFilePath);

                Console.WriteLine(pageFilePath);
            }
        }

        var imageCount = 0;

        var pd = new PrintDocument();
        pd.PrintPage += delegate(object o, PrintPageEventArgs args)
        {
            var i = System.Drawing.Image.FromFile(imageFiles[imageCount]);

            var pageBounds = args.PageBounds;
            var margin = 48;

            var imageBounds = new System.Drawing.Rectangle
            {
                Height = pageBounds.Height - margin,
                Width = pageBounds.Width - margin,
                Location = new System.Drawing.Point(margin / 2, margin / 2)
            };

            args.Graphics.DrawImage(i, imageBounds);
            imageCount++;
        };

        foreach (var imagefile in imageFiles)
        {
            pd.Print();
        }

回答1:

PDF pages don't have a resolution. Images within them can be considered to have a resolution, which is given by the width of the image on the page, divided by the number of image samples in the x direction, and the height of the image on the page divided by the number of image samples in the y direction.

So this leaves calculating the width and height of the image on the page. This is given by the image matrix, modified by the Current Transformation Matrix. So in order to work out the width and height on the page, you need to interpret the content stream up to the point where the image is rendered, tracking the graphics state CTM.

For general PDF files, the only way to know this is to use a PDF interpreter. In the strictly limited case where the whole page content is a single image you can gamble that there is no scaling taking place and simply divide the media width by the image width, and the media height by the image height to give the x and y resolutions.

However this definitely won't work in the general case.