Flink with Ceph as the persistent storage

2019-07-24 18:22发布

问题:

Flink documents suggests that Ceph can be used as a persistent storage for states. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html

Considering that Ceph is a transactional database, wouldn't it have adverse effect on Flink's performance?

回答1:

Ceph describes itself as a "unified, distributed storage system" and provides a network file system API. As such, it such should be seamlessly working with Flink's state backends that persist checkpoints to a remote file system.

I'm not aware of people using Ceph (HDFS and S3 are more commonly used) and have no information about the performance. However, note that Flink is able to write checkpoints asynchronously, such that the performance of the storage system does not affect the processing speed of a Flink application. It might however, constrain the interval in which checkpoints are taken.

Update: (Feb. 2018) I noticed that multiple users reported on Flink's user mailing list that they are using Ceph with Flink.