I've got a cassandra table with a field of type text named snapshot containing JSON objects:
[identifier, timestamp, snapshot]
I understood that to be able to do transformations on that field with Spark, I need to convert that field of that RDD to another RDD to make transformations on the JSON schema.
Is that correct? How should I proceed to to that?
Edit: For now I managed to create an RDD from a single text field:
val conf = new SparkConf().setAppName("signal-aggregation")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val snapshots = sc.cassandraTable[(String, String, String)]("listener", "snapshots")
val first = snapshots.first()
val firstJson = sqlContext.jsonRDD(sc.parallelize(Seq(first._3)))
firstJson.printSchema()
Which shows me the JSON schema. Good!
How do I proceed to tell Spark that this schema should be applied on all rows of the table Snapshots, to get an RDD on that snapshot field from each row?
Almost there, you just want to pass your an RDD[String] with your json into the
jsonRDD
methodA quick example