Consuming/producing data to particular shardID in

2019-08-07 08:03发布

问题:

I need to put the all the records into kinesis from various servers and need to output the data into multiple S3 Files. I have been trying with ShardID, but, not able to make it work out.

Could you please help????

Python/Java would be fine.

回答1:

ShardID is not that important.

  • If you have 20 MB/sec input bandwidth with 20000 request/seconds rate; you should have 20 shards at least.

And with each shard, your data will be spread accross, so it is just about capacity. Those shards does not affect your input and output result. (It also affects parallelization with the help of hash - partition - key but that's another thing, I'm not explaining that not to confuse.)

You should be concerned about "put_record" or "put_records" methods in the producer (ie. input) part; and the record emitted (ie. output) on the consumer side. You should not worry about which shard has the record passed through, you just take the record on the consumer side and process with your business needs.

Using Kinesis Client Library ( https://github.com/awslabs/amazon-kinesis-client ) is the best for this abstraction.

There is also a sample project on GitHub Amazon Kinesis Connectors ( https://github.com/awslabs/amazon-kinesis-connectors ) that does consuming data and uploading it into S3.