I am looking into kafka streams. I want to filter my stream, using a filter with very low selectivity (one in few thousands). I was looking at this method:
https://kafka.apache.org/0100/javadoc/org/apache/kafka/streams/kstream/KStream.html#filter(org.apache.kafka.streams.kstream.Predicate)
But I can't find any evidence, if the filter will be evaluated by consumer (I really do not want to transfer a lot of GB to consumer, just to throw them away), or inside the broker (yay!).
If its evaluated on consumer side, is there any way, how to do this in broker?
Thanks!
Kafka does not support broker side filtering. If you use Streams API, filtering will be done in your application (the predicate will not be evaluated by KafkaConsumer
but within a "processor node" of your topology -- ie, within Streams API runtime code).
This might help: https://docs.confluent.io/current/streams/architecture.html
The reason for not supporting broker side filtering is, that brokers only use (1) byte arrays as key and value data types and use (2) zero-copy mechanism to achieve high throughput. Broker side filtering would required, to deserialize the data at the broker side what would be a major performance hit (deserialization cost and no zero-copy optimization).
If you would like to do server-side filtering I would recommend using the KSQL. It supports a great SQL-like mechanism to filter messages on the server side. But for that you would have to spend more resources to setup a KSQL server and that involves high availability, replication among other concerns.
So if your message throughput is in the decimal thousands/sec then I would use KStreams, if it you have a larger amount and more complex filtering scenarios then I would go for KSQL.