Disk persistence and replication?

2019-07-24 13:47发布

问题:

Message sent to Kafka is written to disk and replicated for fault-tolerance.

I am not sure when message is written to disc, why replication(partition) is required and how it helps ?

Also Kafka is said to be high in throughput, Does not writing to disk impact performance ? Is it configurable to have message just in memory but not in disk ?

回答1:

This blog post by Jay Kreps (who is one of the original architects of Kafka while he was at LinkedIn) explains how Kafka is engineered uniquely as a commit log which can do "2 million writes per second on three cheap machines" and yield much higher messaging rates than traditional message brokers which are not engineered this way.

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Kafka has configurable parameters that control how often page cache is flushed to disk. It is not possible to run Kafka without eventually writing to disk. Kafka also caches messages in memory for performance so it can give high throughput with reads from memory while still guaranteeing no message loss from persistence to disk and clustered replication of data for fault tolerance.