I have looked at both AWS S3 Java SDK - Download file help and Working with Zip and GZip files in Java.
While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3. How would I do this?
Currently I have:
try {
AmazonS3 s3Client = new AmazonS3Client(
new ProfileCredentialsProvider());
String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
BufferedReader fileIn = new BufferedReader(new InputStreamReader(
fileObj.getObjectContent()));
String fileContent = "";
String line = fileIn.readLine();
while (line != null){
fileContent += line + "\n";
line = fileIn.readLine();
}
fileObj.close();
return fileContent;
} catch (IOException e) {
e.printStackTrace();
return "ERROR IOEXCEPTION";
}
Clearly, I am not handling the compressed nature of the file, and my output is:
����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��
However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.
What should I do?
You have to use
GZIPInputStream
to read GZIP filePlease try this way to download GZip file from S3.
I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.
No it's not because of the Scanner as is suggested. It's because the stream is being ungzipped by wrapping
fileObj.getObjectContent()
in aGZIPInputStream
which unzips the contents.Remove the
scanner
but keep theGZIPInputStream
and things will still work.I solved the issue using a
Scanner
instead of anInputStream
.The scanner takes the GZIPInputStream and reads the unzipped file line by line: