Convert DataFrame to RDD[Map] in Scala

2020-05-09 16:45发布

问题:

I want to convert an array created like:

case class Student(name: String, age: Int)
val dataFrame: DataFrame = sql.createDataFrame(sql.sparkContext.parallelize(List(Student("Torcuato", 27), Student("Rosalinda", 34))))

When I collect the results from the DataFrame, the resulting array is an Array[org.apache.spark.sql.Row] = Array([Torcuato,27], [Rosalinda,34])

I'm looking into converting the DataFrame in an RDD[Map] e.g:

Map("name" -> nameOFFirst, "age" -> ageOfFirst)
Map("name" -> nameOFsecond, "age" -> ageOfsecond)

I tried to use map via: x._1 but that does not seem to work for Array [spark.sql.row] How can I anyway perform the transformation?

回答1:

You can use map function with pattern matching to do the job here

import org.apache.spark.sql.Row

dataFrame
  .map { case Row(name, age) => Map("name" -> name, "age" -> age) }

This will result in RDD[Map[String, Any]]