I'm trying to read the messages from kafka (version 10) in spark and trying to print it.
import spark.implicits._
val spark = SparkSession
.builder
.appName("StructuredNetworkWordCount")
.config("spark.master", "local")
.getOrCreate()
val ds1 = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "topicA")
.load()
ds1.collect.foreach(println)
ds1.writeStream
.format("console")
.start()
ds1.printSchema()
getting an error Exception in thread "main"
org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
I struggled a lot with this issue. I tried each of suggested solution from various blog. But I my case there are few statement in between calling start() on query and finally at last i was calling awaitTerminate() function that cause this.
Please try in this fashion, It is perfectly working for me. Working example:
If you write in this way that will cause exception/ error:
will throw given exception and will close your streaming driver.
i fixed issue by using following code.
You are branching the query plan: from the same ds1 you are trying to:
ds1.collect.foreach(...)
ds1.writeStream.format(...){...}
But you are only calling
.start()
on the second branch, leaving the other dangling without a termination, which in turn throws the exception you are getting back.The solution is to start both branches and await termination.
Kindly remove
ds1.collect.foreach(println)
andds1.printSchema()
, useoutputMode
andawaitAnyTermination
for background process Waiting until any of the queries on the associatedspark.streams
has terminatedI was able to resolves this issue by following code. In my scenario, I had multiple intermediate Dataframes, which were basically the transformations made on the inputDF.
joinedDF is the result of the last transformation performed.