how many objects are returned by aws s3api list-ob

2020-08-26 20:32发布

问题:

I am using:

aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text

to get the list of files in a bucket.

The aws s3api list-object documentation page says that this command returns only up to a 1000 objects, however I noticed that in my case it returns the names of all files in my bucket. For example when I run the following command:

aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text | tr "\t" "\n" | wc -l

I get 13512 displayed, meaning that more than 13 thousand file names were returned.

Am I missing smth?

I use the following aws cli version:

aws-cli/1.10.57 Python/2.7.3 Linux/3.2.0-4-amd64 botocore/1.4.47

回答1:

Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. [1]

I think that the part "(up to 1000)" in the documentation's description is highly misleading. It refers to the maximal page size per underlying HTTP request which is sent by the cli. The documentation of the --page-size option makes this clear:

The size of each page to get in the AWS service call. This does not affect the number of items returned in the command's output. Setting a smaller page size results in more calls to the AWS service, retrieving fewer items in each call. This can help prevent the AWS service calls from timing out.

It gets even clearer when reading the AWS documentation about pagination [2] which describes:

For commands that can return a large list of items, the AWS Command Line Interface (AWS CLI) adds three options that you can use to control the number of items included in the output when the AWS CLI calls a service's API to populate the list.

By default, the AWS CLI uses a page size of 1000 and retrieves all available items. For example, if you run aws s3api list-objects on an Amazon S3 bucket that contains 3,500 objects, the CLI makes four calls to Amazon S3, handling the service-specific pagination logic for you in the background and returning all 3,500 objects in the final output.

As Ankit already stated correctly, using the --max-items option is the correct solution to limit the result and stop the automatic pagination:

To include fewer items at a time in the AWS CLI output, use the --max-items option. The AWS CLI still handles pagination with the service as described above, but prints out only the number of items at a time that you specify. [2]

References

[1] https://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
[2] https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html



回答2:

Try using --max-items with the command.

The doc mentions it will return NextMarker when the no of items are more than max-items. You can pass it as starting-token in the next call to achieve pagination.