DataFrame to RDD[(String, String)] conversion

2020-05-06 09:04发布

I want to convert an org.apache.spark.sql.DataFrame to org.apache.spark.rdd.RDD[(String, String)] in Databricks. Can anyone help?

Background (and a better solution is also welcome): I have a Kafka stream which (after some steps) becomes a 2 column data frame. I would like to put this into a Redis cache, first column as a key and second column as a value.

More specifically the type of the input is this: lastContacts: org.apache.spark.sql.DataFrame = [serialNumber: string, lastModified: bigint]. I try to put into Redis as follows:

sc.toRedisKV(lastContacts)(redisConfig)

The error message looks like this:

notebook:20: error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: org.apache.spark.rdd.RDD[(String, String)]
sc.toRedisKV(lastContacts)(redisConfig)

I already played around with some ideas (like function .rdd) but none helped.

标签： scala apache-spark redis apache-kafka databricks

1条回答

祖国的老花朵

2楼-- · 2020-05-06 09:23

You can use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element.

For example:

val df = Seq(("table1",432),
      ("table2",567),
      ("table3",987),
      ("table1",789)).
      toDF("tablename", "Code").toDF()

    df.show()

    +---------+----+
|tablename|Code|
+---------+----+
|   table1| 432|
|   table2| 567|
|   table3| 987|
|   table1| 789|
+---------+----+

    val rddDf = df.map(r => (r(0), r(1))).rdd // Type:RDD[(Any,Any)]

    OR

    val rdd = df.map(r => (r(0).toString, r(1).toString)).rdd  //Type: RDD[(String,String)]

Please refer https://community.hortonworks.com/questions/106500/error-in-spark-streaming-kafka-integration-structu.html regarding AnalysisException: Queries with streaming sources must be executed with writeStream.start()

You need to wait for the termination of the query using query.awaitTermination() To prevent the process from exiting while the query is active.

0人赞添加讨论(0) 举报

DataFrame to RDD[(String, String)] conversion

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间