I've been using pyspark for Spark Streaming (Spark 2.0.2) with Kafka (0.10.1.0) successfully before, but my purposes are better suited for Structured Streaming. I've attempted to use the example online: https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html
with the following analogous code:
ds1 = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
query = ds1
.writeStream
.outputMode('append')
.format('console')
.start()
query.awaitTermination()
However, I always end up with the following error:
: org.apache.kafka.common.config.ConfigException:
Missing required configuration "partition.assignment.strategy" which has no default value
I also tried adding in this to my set of options when creating ds1:
.option("partition.assignment.strategy", "range")
But even explicitly assigning it a value didn't stop the error, nor did any other value (like "roundrobin") that I could find online or in the Kafka documentation.
I also tried this with the "assign" option and achieved the same error (our Kafka host is set up for assign--each consumer is assigned only one partition, and we don't have any rebalancing).
Any idea what's going on here? The documentation isn't helpful (probably since it's still in experimental phase). Also, is there anyway to do Structured Streaming using KafkaUtils? Or is this the only gateway?