I want to convert an org.apache.spark.sql.DataFrame
to org.apache.spark.rdd.RDD[(String, String)]
in Databricks. Can anyone help?
Background (and a better solution is also welcome): I have a Kafka stream which (after some steps) becomes a 2 column data frame. I would like to put this into a Redis cache, first column as a key and second column as a value.
More specifically the type of the input is this: lastContacts: org.apache.spark.sql.DataFrame = [serialNumber: string, lastModified: bigint]
. I try to put into Redis as follows:
sc.toRedisKV(lastContacts)(redisConfig)
The error message looks like this:
notebook:20: error: type mismatch;
found : org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: org.apache.spark.rdd.RDD[(String, String)]
sc.toRedisKV(lastContacts)(redisConfig)
I already played around with some ideas (like function .rdd
) but none helped.