So, so I have a dynamodb table with a primary partition key column, foo_id
and no primary sort key. I have a list of foo_id
values, and want to get the observations associated with this list of ids.
I figured the best way to do this (?) is to use batch_get_item()
, but it's not working out for me.
# python code
import boto3
client = boto3.client('dynamodb')
# ppk_values = list of `foo_id` values (strings) (< 100 in this example)
x = client.batch_get_item(
RequestItems={
'my_table_name':
{'Keys': [{'foo_id': {'SS': [id for id in ppk_values]}}]}
})
I'm using SS
because I'm passing a list of strings (list of foo_id
values), but I'm getting:
ClientError: An error occurred (ValidationException) when calling the
BatchGetItem operation: The provided key element does not match the
schema
So I assume that means it's thinking foo_id
contains list values instead of string values, which is wrong.
--> Is that interpretation right? What's the best way to batch query for a bunch of primary partition key values?
The keys should be given as mentioned below. It can't be mentioned as 'SS'.
Basically, you can compare the DynamoDB String datatype with String (i.e. not with SS). Each item is handled separately. It is not similar to SQL in query.
'Keys': [
{
'foo_id': key1
},
{
'foo_id': key2
}
],
Sample code:-
You may need to change the table name and key values.
from __future__ import print_function # Python 2/3 compatibility
import boto3
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError
# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
if o % 1 > 0:
return float(o)
else:
return int(o)
return super(DecimalEncoder, self).default(o)
dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000")
email1 = "abc@gmail.com"
email2 = "bcd@gmail.com"
try:
response = dynamodb.batch_get_item(
RequestItems={
'users': {
'Keys': [
{
'email': email1
},
{
'email': email2
},
],
'ConsistentRead': True
}
},
ReturnConsumedCapacity='TOTAL'
)
except ClientError as e:
print(e.response['Error']['Message'])
else:
item = response['Responses']
print("BatchGetItem succeeded:")
print(json.dumps(item, indent=4, cls=DecimalEncoder))
The approved answer no longer works.
For me the working call format was like so:
import boto3
client = boto3.client('dynamodb')
# ppk_values = list of `foo_id` values (strings) (< 100 in this example)
x = client.batch_get_item(
RequestItems={
'my_table_name': {
'Keys': [{'foo_id': {'S': id}} for id in ppk_values]
}
}
)
The type information was required. For me it was "S" for string keys. Without it I got an error saying the libraries found a str
but expected a dict
. That is, they wanted {'foo_id': {'S': id}}
instead of the simpler {'foo_id': id}
that I tried first.
Boto3 now has a version of batch_get_item
that lets you pass in the keys in a more natural Pythonic way without specifying the types.
You can find a complete and working code example in https://github.com/awsdocs/aws-doc-sdk-examples. That example deals with some additional nuances around retries, but here's a digest of the parts of the code that answer this question:
import logging
import boto3
dynamodb = boto3.resource('dynamodb')
logger = logging.getLogger(__name__)
movie_table = dynamodb.Table('Movies')
actor_table = dyanmodb.Table('Actors')
batch_keys = {
movie_table.name: {
'Keys': [{'year': movie[0], 'title': movie[1]} for movie in movie_list]
},
actor_table.name: {
'Keys': [{'name': actor} for actor in actor_list]
}
}
response = dynamodb.batch_get_item(RequestItems=batch_keys)
for response_table, response_items in response.items():
logger.info("Got %s items from %s.", len(response_items), response_table)