Amazon Kinesis and guaranteed ordering

2020-05-20 07:53发布

Amazon claims their Kinesis streaming product guarantees record ordering.

It provides ordering of records, as well as the ability to read and/or replay records in the same order (...)

Kinesis is composed of Streams that are themselves composed of one or more Shards. Records are stored in these Shards. We can write consumer applications that connect to a Shard and read/replay records in the order they were stored.

But can Kinesis guarantee, out of the box, ordering for the Stream itself without pushing ordering logic to the consumers? How can a consumer read records from multiple Shards of the same Stream, making sure the records are read in the same order they were added to the Stream?

3条回答
乱世女痞
2楼-- · 2020-05-20 08:28

enter image description here

Not sure about this though.

But in this i guess they are saying that the ordering is possible between multiple shards.

I hope Data streams means logical grouping of shards. So then if this is true then the ordering is possible i suppose.

Please check and confirm

查看更多
霸刀☆藐视天下
3楼-- · 2020-05-20 08:34

It seems this is not possible to achieve. Ordering is guaranteed on a shard level, but not across the all stream.

https://brandur.org/kinesis-order

So back to our original question: how can we guarantee that all records are consumed in the same order in which they’re produced? The answer is that we can’t, but that we shouldn’t let that unfortunate reality bother us too much. Once we’ve scaled our stream to multiple shards, there’s no mechanism that we can use to guarantee that records are consumed in order across the whole stream; only within a single shard.

查看更多
Deceive 欺骗
4楼-- · 2020-05-20 08:35

If you need guaranteed order of all data in the stream you can only have one shard. That, of course, doesn't scale very well. What you need to determine is whether you really need that level of ordered data. Is all the data in the stream related to all the other data? The key is to put data in shards when the data is related. Use multiple shards to allow your data to be processed in parallel. If all related data is together in one shard you can take advantage of the guaranteed ordering. If you really need all the data to be ordered you're just going to have to deal with the limited scaling that necessarily comes with that.

查看更多
登录 后发表回答