We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream services and components will work better with dynamodb. We are wondering what is the best approach to eventually move the records from Glue to Dynamo.
Should we write to S3 first and then run lambdas to insert the data into Dynamo? Is that the best practice? OR
Should we use a third party JDBC wrapper for Dynamodb and use Glue to directly write to Dynamo (Not sure if this is possible, sounds a bit scary) OR
Should we do something else?
Any help is greatly appreciated. Thanks!
I am able to write using boto3... definitly its not best approach to load but its working one. :)
dynamodb = boto3.resource('dynamodb','us-east-1') table =
dynamodb.Table('BULK_DELIVERY')
print "Start testing"
for row in df1.rdd.collect():
var1=row.sourceCid
print(var1) table.put_item( Item={'SOURCECID': "{}".format(var1)} )
print "End testing"
You can add the following lines to your Glue ETL script:
glueContext.write_dynamic_frame.from_options(frame =DynamicFrame.fromDF(df, glueContext, "final_df"), connection_type = "dynamodb", connection_options = {"tableName": "pceg_ae_test"})
df should be of type DynamicFrame
For your workloads, Amaon actually recommens using data pipelines.
It bypasses glue. So it is mostly used to load S3 files to Dynamo. But it may work.