How to list all AWS S3 objects in a bucket using J

2019-01-10 09:10发布

问题:

What is the simplest way to get a list of all items within an S3 bucket using Java?

List<S3ObjectSummary> s3objects = s3.listObjects(bucketName,prefix).getObjectSummaries();

This example only returns 1000 items.

回答1:

It might be a workaround but this solved my problem:

ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();

while (listing.isTruncated()) {
   listing = s3.listNextBatchOfObjects (listing);
   summaries.addAll (listing.getObjectSummaries());
}


回答2:

This is direct from AWS documentation:

AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());        

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
    .withBucketName(bucketName)
    .withPrefix("m");
ObjectListing objectListing;

do {
        objectListing = s3client.listObjects(listObjectsRequest);
        for (S3ObjectSummary objectSummary : 
            objectListing.getObjectSummaries()) {
            System.out.println( " - " + objectSummary.getKey() + "  " +
                    "(size = " + objectSummary.getSize() + 
                    ")");
        }
        listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());


回答3:

I am processing a large collection of objects generated by our system; we changed the format of the stored data and needed to check each file, determine which ones were in the old format, and convert them. There are other ways to do this, but this one relates to your question.

    ObjectListing list = amazonS3Client.listObjects(contentBucketName, contentKeyPrefix);

    do {                

        List<S3ObjectSummary> summaries = list.getObjectSummaries();

        for (S3ObjectSummary summary : summaries) {

            String summaryKey = summary.getKey();               

            /* Retrieve object */

            /* Process it */

        }

        list = amazonS3Client.listNextBatchOfObjects(list);

    }while (list.isTruncated());


回答4:

As a slightly more concise solution to listing S3 objects when they might be truncated:

ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucketName);
ObjectListing listing = null;

while((listing == null) || (request.getMarker() != null)) {
  listing = s3Client.listObjects(request);
  // do stuff with listing
  request.setMarker(listing.getNextMarker());
}


回答5:

Listing Keys Using the AWS SDK for Java

http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html

import java.io.IOException;
import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Request;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;

public class ListKeys {
    private static String bucketName = "***bucket name***";

    public static void main(String[] args) throws IOException {
        AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
        try {
            System.out.println("Listing objects");
            final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
            ListObjectsV2Result result;
            do {               
               result = s3client.listObjectsV2(req);

               for (S3ObjectSummary objectSummary : 
                   result.getObjectSummaries()) {
                   System.out.println(" - " + objectSummary.getKey() + "  " +
                           "(size = " + objectSummary.getSize() + 
                           ")");
               }
               System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
               req.setContinuationToken(result.getNextContinuationToken());
            } while(result.isTruncated() == true ); 

         } catch (AmazonServiceException ase) {
            System.out.println("Caught an AmazonServiceException, " +
                    "which means your request made it " +
                    "to Amazon S3, but was rejected with an error response " +
                    "for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());
        } catch (AmazonClientException ace) {
            System.out.println("Caught an AmazonClientException, " +
                    "which means the client encountered " +
                    "an internal error while trying to communicate" +
                    " with S3, " +
                    "such as not being able to access the network.");
            System.out.println("Error Message: " + ace.getMessage());
        }
    }
}


回答6:

Gray your solution was strange but you seem like a nice guy.

AmazonS3Client s3Client = new AmazonS3Client(new BasicAWSCredentials( ....

ObjectListing images = s3Client.listObjects(bucketName); 

List<S3ObjectSummary> list = images.getObjectSummaries();
for(S3ObjectSummary image: list) {
    S3Object obj = s3Client.getObject(bucketName, image.getKey());
    writeToFile(obj.getObjectContent());
}


回答7:

I know this is an old post, but this still might be usefull to anyone: The Java/Android SDK on version 2.1 provides a method called setMaxKeys. Like this:

s3objects.setMaxKeys(arg0)

You probably found a solution by now, but please check one answer as correct so that it might help others in the future.



回答8:

This worked for me.

Thread thread = new Thread(new Runnable(){
    @Override
    public void run() {
        try {
            List<String> listing = getObjectNamesForBucket(bucket, s3Client);
            Log.e(TAG, "listing "+ listing);

        }
        catch (Exception e) {
            e.printStackTrace();
            Log.e(TAG, "Exception found while listing "+ e);
        }
    }
});

thread.start();



  private List<String> getObjectNamesForBucket(String bucket, AmazonS3 s3Client) {
        ObjectListing objects=s3Client.listObjects(bucket);
        List<String> objectNames=new ArrayList<String>(objects.getObjectSummaries().size());
        Iterator<S3ObjectSummary> oIter=objects.getObjectSummaries().iterator();
        while (oIter.hasNext()) {
            objectNames.add(oIter.next().getKey());
        }
        while (objects.isTruncated()) {
            objects=s3Client.listNextBatchOfObjects(objects);
            oIter=objects.getObjectSummaries().iterator();
            while (oIter.hasNext()) {
                objectNames.add(oIter.next().getKey());
            }
        }
        return objectNames;
}


回答9:

For those, who are reading this in 2018+. There are two new pagination-hustle-free APIs available: one in AWS SDK for Java 1.x and another one in 2.x.

1.x

There is a new API in Java SDK that allows you to iterate through objects in S3 bucket without dealing with pagination:

AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();

S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {
    // TODO: Consume `objectSummary` the way you need
    System.out.println(objectSummary.key);
});

This iteration is lazy:

The list of S3ObjectSummarys will be fetched lazily, a page at a time, as they are needed. The size of the page can be controlled with the withBatchSize(int) method.

2.x

The API changed, so here is an SDK 2.x version:

S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);

for (ListObjectsV2Response page : response) {
    page.contents().forEach((S3Object object) -> {
        // TODO: Consume `object` the way you need
        System.out.println(object.key());
    });
}

ListObjectsV2Iterable is lazy as well:

When the operation is called, an instance of this class is returned. At this point, no service calls are made yet and so there is no guarantee that the request is valid. As you iterate through the iterable, SDK will start lazily loading response pages by making service calls until there are no pages left or your iteration stops. If there are errors in your request, you will see the failures only after you start iterating through the iterable.



回答10:

Try this one out

public void getObjectList(){
        System.out.println("Listing objects");
        ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
                .withBucketName(bucketName)
                .withPrefix("ads"));
        for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
            System.out.println(" - " + objectSummary.getKey() + "  " +
                               "(size = " + objectSummary.getSize() + ")");
        }
    }

You can all the objects within the bucket with specific prefix.