I have a data flow use case where I want to have topics defined based on each of the customer repositories (which might be in the order of 100,000s) Each data flow would be a topic with partitions (in the order of a few 10s) defining the different stages of the flow.
Is Kafka good for a scenario like this? If not how would I remodel my use case to handle such scenarios. Also it is the case that each customer repository data cannot be mingled with others even during processing.
Update Sep 2018: Today, as of Kafka v2.0, a Kafka cluster can have hundreds of thousands of topics. See https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.
Initial answer below for posterity:
The rule of thumb is that the number of Kafka topics can be in the thousands.
The Kafka FAQ gives the following abstract guideline:
The article http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ (written by the aforementioned Jun Rao) adds further details, and particularly focuses on the impact of the number of partitions.
IMHO your use case / model is a bit of a stretch for a single Kafka cluster, though not necessarily for Kafka in general. With the little information you shared (I understand that a public forum is not the best place for sensitive discussions :-P) the only off-the-hip comment I can provide you with is to consider using more than one Kafka cluster because you mentioned that customer data must be very much isolated anyways (including the processing steps).
I hope this helps a bit!