Unable to paginate EMR cluster using boto

2019-08-02 16:42发布

问题:

I have about 55 EMR clusters (all of them were terminated) and have been trying to retrieve the entire 55 EMR clusters using the list_clusters method in boto. I've been searching for examples about paginating the number of result set from boto but couldn't find any examples. Given this statement:

emr_object.list_clusters(cluster_states=["TERMINATED"], marker="what_should_i_use_here").clusters

I kept getting InvalidRequestException error:

boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request
<ErrorResponse xmlns="http://elasticmapreduce.amazonaws.com/doc/2009-03-31">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidRequestException</Code>
    <Message>Marker 'what_should_i_use_here' is not valid.</Message>
  </Error>
  <RequestId>555b91bd-c122-11e3-8e31-abc75abdb39d</RequestId>
</ErrorResponse>

What should I provide in marker param so that I can properly paginate the query?

Thanks!

回答1:

Tried with

emr_object.describe_jobflows(states=["TERMINATED"])

and it works! This method returns all the clusters.



回答2:

You can pass in None the first time round.

If the ClusterListResult you get back has a marker attribute then you can pass that in later, e.g.

m=None
while True:
    try:
        cluster_list_result=emr_object.describe_jobflows(states=['TERMINATED'], marker=m)
        .... Do whatever with cluster_list_result.clusters
        m=cluster_list_result.marker  # See if there are more
    except AttributeError:
        break