I am using Kafka
and Zookeeper
as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza
as the real time data processing tool for small transformations that I need to make on the data.
My problem is that one of my consumers (lets say ConsumerA
) consumes several topics from Kafka
and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component.
This is what bothers me, is this a desired architecture in Kafka?
Should I rather do all the processing in Samza
and store only the digested (summary) information to the Kafka
from Samza
. But the amount of processing I am going to do is quite heavy, that is why I want to use a separate component for it (ComponentA
). I guess my question can be generalized to all kind of data pipelines.
So is it a good practice for a component to be a consumer and a producer in a data pipeline?