Why is file being corrupted during multipart uploa

2020-06-16 04:59发布

问题:

Have a SPA with a redux client and an express webapi. One of the use cases is to upload a single file from the browser to the express server. Express is using the multer middleware to decode the file upload and place it into an array on the req object. Everything works as expected when running on localhost.

However when the app is deployed to AWS, it does not function as expected. Deployment pushes the express api to an AWS Lambda function, and the redux client static assets are served by Cloudfront CDN. In that environment, the uploaded file does make it to the express server, is handled by multer, and the file does end up as the first (and only) item in the req.files array where it is expected to be.

The problem is that the file contains the wrong bytes. For example when I upload a sample image that is 2795 bytes in length, the file that ends up being decoded by multer is 4903 bytes in length. Other images I have tried always end up becoming larger by approximately the same factor by the time multer decodes and puts them into the req.files array. As a result, the files are corrupted and are not displaying as images.

The file is uploaded like so:

<input type="file" name="files" onChange={this.onUploadFileSelected} />
...
onUploadFileSelected = (e) => {
  const file = e.target.files[0]
  var formData = new FormData()
  formData.append("files", file)
  axios.post('to the url', formData, { withCredentials: true })
  .then(handleSuccessResponse).catch(handleFailResponse)
}

I have tried setting up multer with both MemoryStorage and DiskStorage. Both work, both on localhost and in the aws lambda, however both exhibit the same behavior -- the file is a larger size and corrupted in the store.

I have also tried setting up multer as both a global middleware (via app.use) and as a route-specific middleware on the upload route (via routes.post('the url', multerMiddlware, controller.uploadAction). Again, both exhibit the same behavior. Multer middleware is configured like so:

const multerMiddleware = multer({/* optionally set dest: '/tmp' */})
  .array('files')

One difference is that on localhost, both the client and express are served over http, whereas in aws, both the client and express are served over https. I don't believe this makes a difference, but I have yet been unable to test -- either running localhost over https, or running in aws over http.

Another peculiar thing I noticed was that when the multer middleware is present, other middlewares do not seem to function as expected. Rather than the next() function moving flow down to the controller action, instead, other middlewares will completely exit before the controller action invocation, and when the controller invocation exits, control does not flow back into the middlware after the next() call. When the multer middleware is removed, other middlewares do function as expected. However this observation is on localhost, where the entire end-to-end use case does function as expected.

What could be messing up the uploaded image file payload when deployed to the cloud, but not on localhost? Could it really be https making the difference?

Update 1

When I upload this file (11228 bytes)

Here is the HAR chrome is giving me for the local (expected) file upload:

"postData": {
  "mimeType": "multipart/form-data; boundary=----WebKitFormBoundaryC4EJZBZQum3qcnTL",
  "text": "------WebKitFormBoundaryC4EJZBZQum3qcnTL\r\nContent-Disposition: form-data; name=\"files\"; filename=\"danludwig.png\"\r\nContent-Type: image/png\r\n\r\n\r\n------WebKitFormBoundaryC4EJZBZQum3qcnTL--\r\n"
}

Here is the HAR chrome is giving me for the aws (corrupted) file upload:

"postData": {
  "mimeType": "multipart/form-data; boundary=----WebKitFormBoundaryoTlutFBxvC57UR10",
  "text": "------WebKitFormBoundaryoTlutFBxvC57UR10\r\nContent-Disposition: form-data; name=\"files\"; filename=\"danludwig.png\"\r\nContent-Type: image/png\r\n\r\n\r\n------WebKitFormBoundaryoTlutFBxvC57UR10--\r\n"
}

The corrupted image file that is saved is 19369 bytes in length.

Update 2

I created a text file with the text hello world that is 11 bytes long and uploaded it. It does NOT become corrupted in aws. This is the case even if I upload it with the txt or png suffix, it ends up as 11 bytes in length when persisted.

Update 3

Tried uploading with a much larger text file (12132 bytes long) and had the same result as in update 2 -- the file is persisted intact, not corrupted.

回答1:

Potential answers:

Found this https://forums.aws.amazon.com/thread.jspa?threadID=252327

API Gateway does not natively support multipart form data. It is possible to configure binary passthrough to then handle this multipart data in your integration (your backend integration or Lambda function).

It seems that you may need another approach if you are using API Gateway events in AWS to trigger the lambda that hosts your express server.

Or, you could configure API Gateway to work with binary payloads per https://stackoverflow.com/a/41770688/304832

Or, upload directly from your client to a signed s3 url (or a public one) and use that to trigger another lambda event.

Until we get a chance to try out different API Gateway settings, we found a temporary workaround: using FileReader to convert the file to a base64 text string, then submit that. The upload does not seem to have any issues as long as the payload is text.