How to batch_get_item many items at once given a l

2020-08-26 11:37发布

问题:

So, so I have a dynamodb table with a primary partition key column, foo_id and no primary sort key. I have a list of foo_id values, and want to get the observations associated with this list of ids.

I figured the best way to do this (?) is to use batch_get_item(), but it's not working out for me.

    # python code
    import boto3
    client = boto3.client('dynamodb')

    # ppk_values = list of `foo_id` values (strings) (< 100 in this example)
    x = client.batch_get_item(
        RequestItems={
            'my_table_name':
                {'Keys': [{'foo_id': {'SS': [id for id in ppk_values]}}]}
        })

I'm using SS because I'm passing a list of strings (list of foo_id values), but I'm getting:

ClientError: An error occurred (ValidationException) when calling the
BatchGetItem operation: The provided key element does not match the
schema

So I assume that means it's thinking foo_id contains list values instead of string values, which is wrong.

--> Is that interpretation right? What's the best way to batch query for a bunch of primary partition key values?

回答1:

The keys should be given as mentioned below. It can't be mentioned as 'SS'.

Basically, you can compare the DynamoDB String datatype with String (i.e. not with SS). Each item is handled separately. It is not similar to SQL in query.

'Keys': [
            {
                'foo_id': key1
            },
            {
                'foo_id': key2
            }
], 

Sample code:-

You may need to change the table name and key values.

from __future__ import print_function # Python 2/3 compatibility
import boto3
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError

# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, decimal.Decimal):
            if o % 1 > 0:
                return float(o)
            else:
                return int(o)
        return super(DecimalEncoder, self).default(o)

dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000")

email1 = "abc@gmail.com"
email2 = "bcd@gmail.com"

try:
    response = dynamodb.batch_get_item(
        RequestItems={
            'users': {
                'Keys': [
                    {
                        'email': email1
                    },
                    {
                        'email': email2
                    },
                ],            
                'ConsistentRead': True            
            }
        },
        ReturnConsumedCapacity='TOTAL'
    )
except ClientError as e:
    print(e.response['Error']['Message'])
else:
    item = response['Responses']
    print("BatchGetItem succeeded:")
    print(json.dumps(item, indent=4, cls=DecimalEncoder))


回答2:

The approved answer no longer works.

For me the working call format was like so:

import boto3
client = boto3.client('dynamodb')

# ppk_values = list of `foo_id` values (strings) (< 100 in this example)
x = client.batch_get_item(
    RequestItems={
        'my_table_name': {
            'Keys': [{'foo_id': {'S': id}} for id in ppk_values]
        }
    }
)

The type information was required. For me it was "S" for string keys. Without it I got an error saying the libraries found a str but expected a dict. That is, they wanted {'foo_id': {'S': id}} instead of the simpler {'foo_id': id} that I tried first.



回答3:

Boto3 now has a version of batch_get_item that lets you pass in the keys in a more natural Pythonic way without specifying the types.

You can find a complete and working code example in https://github.com/awsdocs/aws-doc-sdk-examples. That example deals with some additional nuances around retries, but here's a digest of the parts of the code that answer this question:

import logging
import boto3

dynamodb = boto3.resource('dynamodb')
logger = logging.getLogger(__name__)

movie_table = dynamodb.Table('Movies')
actor_table = dyanmodb.Table('Actors')

batch_keys = {
    movie_table.name: {
        'Keys': [{'year': movie[0], 'title': movie[1]} for movie in movie_list]
    },
    actor_table.name: {
        'Keys': [{'name': actor} for actor in actor_list]
    }
}

response = dynamodb.batch_get_item(RequestItems=batch_keys)

for response_table, response_items in response.items():
    logger.info("Got %s items from %s.", len(response_items), response_table)