For testing purpose, I need to simulate client for generating 100,000 messages per second and send them to kafka topic. Is there any tool or way that can help me generate these random messages?
问题:
回答1:
There's a built-in tool for generating dummy load, located in bin/kafka-producer-perf-test.sh
(https://github.com/apache/kafka/blob/trunk/bin/kafka-producer-perf-test.sh). You may refer to https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/ProducerPerformance.java#L106 to figure out how to use it.
One usage example would be like that:
bin/kafka-producer-perf-test.sh --broker-list localhost:9092 --messages 10000000 --topic test --threads 10 --message-size 100 --batch-size 10000 --throughput 100000
The key here is the --throughput 100000
flag which indicated "throttle maximum message amount to approx. 100000 messages per second"
回答2:
I'd also suggest to take a look at https://github.com/josephadler/eventsim, which will produce more "realistic" synthetic data (yeah, I am aware of the irony of what I just said :-P):
Eventsim is a program that generates event data for testing and demos. It's written in Scala, because we are big data hipsters (at least sometimes). It's designed to replicate page requests for a fake music web site (picture something like Spotify); the results look like real use data, but are totally fake. You can configure the program to create as much data as you want: data for just a few users for a few hours, or data for a huge number of users of users over many years. You can write the data to files, or pipe it out to Apache Kafka.
You can use the fake data for product development, correctness testing, demos, performance testing, training, or in any other place where a stream of real looking data is useful. You probably shouldn't use this data to research machine learning algorithms, and definitely shouldn't use it to understand how real people behave.
回答3:
You can make use of Kafka Connect to generate random test data. Check out this custom source Connector https://github.com/xushiyan/kafka-connect-datagen
It allows you to define some settings like message template and randomizable fields to generate test data. Also check out this post for detailed demonstration.