I am using kafka version 2.11-1.0.1 and Spark version 2.0.2. I have to make a dataframe for kafka response. So How Can I make dataframe for kafkaStream? Thanks In Advance
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
As you said,
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer3", {topic: 1})
lines = kvs.map(lambda x: x[1])
Here, lines
is a dStream
of rdds
and not a single a rdd
in itself. Hence, to get a dataframe you have to convert it into a dStream
of dataframes.
Something like this,
lines.foreachRDD(lambda rdd: rdd.toDF())