what is the difference between kafka ProducerRecor

I'm measuring the kafka producer producer performance. Currently I've met two clients with bit different configuration and usage:

Common:

def buildKafkaConfig(hosts: String, port: Int): Properties = {
  val props = new Properties()    
  props.put("metadata.broker.list", brokers)
  props.put("serializer.class", "kafka.serializer.StringEncoder")
  props.put("producer.type", "async") 
  props.put("request.required.acks", "0")
  props.put("queue.buffering.max.ms", "5000")
  props.put("queue.buffering.max.messages", "2000")
  props.put("batch.num.messages", "300")
  props
}

First Client:

"org.apache.kafka" % "kafka_2.11" % "0.8.2.2"

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new Producer[String, String](new ProducerConfig(kafkaConfig))

// ... somewhere in code 
producer.send(new KeyedMessage[String, String]("my-topic", data))

Second Client:

"org.apache.kafka" % "kafka-clients" % "0.8.2.2"

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new KafkaProducer[String, String](kafkaConfig)
// ... somewhere in code 
producer.send(new ProducerRecord[String, String]("my-topic", data))

My questions are:

What is the difference between 2 clients?
Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?

标签： scala apache-kafka kafka-producer-api

1条回答

闹够了就滚

2楼-- · 2019-04-13 02:30

what is the difference between 2 clients?

They are simply old vs new APIs. Kafka starting 0.8.2.x exposed a new set of API's to work with kafka, older being Producer which works with KeyedMessage[K,V] where the new API is KafkaProducer with ProducerRecord[K,V]:

As of the 0.8.2 release we encourage all new development to use the new Java producer. This client is production tested and generally both faster and more fully featured than the previous Scala client.

You should preferably be using the new supported version.

Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?

This is a very broad question, which depends a lot on the architecture of your software. It varies with scale, amount of producers, amount of consumers, etc.. There are many things to be taken into account. I would suggest going through the documentation and reading up the sections talking about Kafka's architecture and design to get a better picture of how it works internally.

Generally speaking, from my experience you'll need to balance the replication factor of your data, along with retention times and number of partitions each queue goes into. If you have more specific questions down the road, you should definitely post a question.

0人赞添加讨论(0) 举报

what is the difference between kafka ProducerRecor

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间