ImageMagick not converting pdfs anymore in AWS Lam

2020-06-12 04:19发布

I've had a AWS Lambda function running on S3 objects for the last 18 months and it died around a month ago after a minor update. I've reverted it but it's still broken. I've looked into doing the most basic conversion of pdf using ImageMagick with no luck so I think AWS has updated something and caused the pdf module to either be removed or stop working.

I've done just the basic function I was basically doing in my core code in Node.js 8.10:

gm(response.Body).setFormat("png").stream((err, stdout,stderr) => {
  if (err) {
    console.log('broken');
  }
  const chunks = [];
  stdout.on('data', (chunk) => {
    chunks.push(chunk);
  });
  stdout.on('end', () => {
    console.log('gm done!');
  });
  stderr.on('data', (data) => {
    console.log('std error data ' + data);
  })
});

with the error response:

std error dataconvert: unable to load module `/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found

I've also tried moving to Node.js 10.x and using the ImageMagick layer that's available through the aws serverless app repository. Trying this on the same code generates this error

std error data convert: FailedToExecuteCommand `'gs' -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 '-sDEVICE=pngalpha' -dTextAlphaBits=4 -dGraphicsAlphaBits=4 '-r72x72' '-sOutputFile=/tmp/magick-22TOeBgB4WrfoN%d' '-f/tmp/magick-22KvuEBeuJuyq3' '-f/tmp/magick-22dj24vSktMXsj'' (1) @ error/pdf.c/InvokePDFDelegate/292

In both cases the function works correctly when running on an image file instead.

Based on this I think both the aws 8.10 ImageMagick and the layer for 10 are missing the pdf module but I'm unsure how to add it or why it was removed in the first place. Whats the best way to fix this function that was working?

EDIT

So I've downloaded https://github.com/serverlesspub/imagemagick-aws-lambda-2 and built the library manually, uploaded it to Lambda and got it successfully working as a layer however it doesn't include GhostScript of which it is an optional library. I've tried to add it to Makefile_ImageMagick which builds and has some references to Ghostscript in the result but running it doesn't fix the PDF issue (images still work). Whats the best way to add the GhostScript optional library to the Make file?

4条回答
ゆ 、 Hurt°
2楼-- · 2020-06-12 04:29

I had the issue where ghostscript was no longer found.

Previously, I had referenced ghostscript via:

var gs = '/usr/bin/gs';

Since AWS lambda stopped providing that package, I went and included it directly into my lambda function which worked for me. I just downloaded the files from https://github.com/sina-masnadi/lambda-ghostscript and placed it in a folder called 'ghostscript' Then referenced it as so:

var path = require('path')
var gs = path.join(__dirname,"ghostscript","bin","gs")
查看更多
成全新的幸福
3楼-- · 2020-06-12 04:31

While the other answers helped there was still a lot of work to get to a workable solution so below is how I managed to fix this, specifically for NodeJS.

Download: https://github.com/sina-masnadi/lambda-ghostscript

zip up the bin directory and upload it as a layer into Lambda.

Add https://github.com/sina-masnadi/node-gs to your NodeJS modules. You can either upload them as part of your project or the way I did it as a layer (along with all your other required ones).

Add https://github.com/serverlesspub/imagemagick-aws-lambda-2 as a layer. Best way to do this is to create a new function in Lambda, Select Browse serverless app repository, search for "ImageMagick" and select "image-magick-lambda-layer" (You can also build it and upload it as a layer too).

Add the three layers to your function, I've done it in this order

  1. GhostScript
  2. ImageMagick
  3. NodeJS modules

Add the appPath to the require statement for ImageMagick and GhostScript:

var gm = require("gm").subClass({imageMagick: true, appPath: '/opt/bin/'});
var gs = require('gs');

Mine was in an async waterfall so before my previous processing function I added this function to convert to a png if wasn't an image already:

  function convertIfPdf(response, next) {
    if (fileType == "pdf") {
      fs.writeFile("/tmp/temp.pdf", response.Body, function(err) {
        if (!err) {
          gs().batch().nopause().executablePath('/opt/bin/./gs').device('png16m').input("/tmp/temp.pdf").output('/tmp/temp.png').exec(function (err, stdout, stderr){
            if (!err && !stderr) {
              var data = fs.readFileSync('/tmp/temp.png');
              next(null, data);
            } else {
              console.log(err);
              console.log(stderr);
            }
          });
        }
      });
    } else {
      next(null, response.Body);
    }
  }

From then on you can do what you were previously doing in ImageMagick as it's in the same format. There may be better ways to do the pdf conversion but I was having issues with the GS library unless working with files. If there are better ways let me know.

If you are having issues loading the libraries make sure the path is correct, it is dependent on how you zipped it up.

查看更多
做自己的国王
4楼-- · 2020-06-12 04:46

You can add a Layer to your lambda function to make it work again until the 22/07/2019. The ARN of the Layer that you need to add is the following : arn:aws:lambda:::awslayer:AmazonLinux1703

The procedure is described at upcoming-updates-to-the-aws-lambda-execution-environment

Any long term solution would be wonderful.

查看更多
The star\"
5楼-- · 2020-06-12 04:50

I had the same problem. Two cloud services processing thousands of PDF pages a day failing because of the pdf.la not found error.

The solution was to switch from Image Magick to GhostScript to convert PDFs to PNGs and then use ImageMagick with PNGs (if needed). This way, IM never has to deal with PDFs and wont need the pdf.la file.

To use GhostScript on AWS Lambda just upload the gs binary in the function zip file.

查看更多
登录 后发表回答