I have a requirement to download photos (not in same directory) from S3, ZIP them and again upload to S3 using AWS S3 Java SDK. This zip file size can go in GBs. Currently I am using AWS Lambda which has a limitation of temporary storage up to 500 MB. So I don't want to save ZIP file on disk instead I want to stream ZIP file (which is being created dynamically using downloaded photos from S3) directly to S3. I need this using AWS S3 Java SDK.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
The basic idea is to use streaming operations. This way you won't wait till the ZIP is generated on a filesystem, but start uploading as soon, as the ZIP algorithm produces any data. Obviously, some data will be buffered in memory, still no need to wait for the whole ZIP to be generated on a disk. We'll also use stream compositions and
PipedInputStream
/PipedOutputStream
in two threads: one to read the data, and the other to ZIP the contents.Here is a version for aws-java-sdk:
However, note that it will print a warning:
That's because S3 needs to know the size of data in advance, before the upload. It's impossible to know the size of a resulting ZIP in advance. You can probably try your luck with multipart uploads, but the code will be more trickier. Though, the idea would be similar: one thread should read the data and send the content in ZIP stream and the other thread should read ZIPped entries and upload them as multiparts. After all the entries (parts) are uploaded, the multipart should be completed.
Here is an example for aws-java-sdk-2.x:
It suffers from the same plague: the ZIP needs to be prepared in memory before the upload.
If you're interested, I've prepared a demo project, so you can play with the code.