We have a DynamoDB table in production that is being continuously updated,we want to load all the records from dynamoDB to redshift.
we tried using the copy command,but since new records are continuously being inserted in table the copy command runs forever.
We want to know what is the best way to load the data from live dynamodb to redshift.
You can utilize the following pattern:
DynamoDB Streams --> AWS Lambda --> Amazon Kinesis Firehose --> Amazon Redshift.
Diagram from AWS article DynamoDB Streams Use Cases and Design Patterns.
Please also see answer here, AWS DynamoDB Stream into Redshift.
DynamoDB streams are effectively the same as a Kinesis Data Stream, but it is automatically generated by new/changed data in DynamoDB. This allows applications to be notified when new data is added to a DynamoDB table, or when data is changed.
A Kinesis Data Firehose can automatically output a stream into Redshift (amongst other destinations).
AWS Lambda can run code without provisioning or managing servers. You pay only for the compute time you consume — there's no charge when your code isn't running. You can run code for virtually any type of application or backend service — all with zero administration.
Lambda is useful for inspecting data coming through a stream. For example, it could be used to manipulate the data format or skip-over data that is not required.
Putting it all together, you could have data added/modified in DynamoDB. This would cause a DynamoDB Stream to be sent that contains information about the change. An AWS Lambda function could inspect the data and manipulate/drop the message. It could then forward the data to Kinesis Data Firehose to automatically insert the data into Amazon Redshift.
Consider looking into a DynamoDB Streams based solution. Streams provides an ordered log of the data plane events transpiring on each DynamoDB partition (so events for each primary key are absolutely ordered). You can use Kinesis Client Library and DynamoDB Streams Kinesis Adapter to process the Stream to Redshift.
DynamoDB Streams is currently in preview, but should be generally available soon.