I have a org.apache.spark.mllib.linalg.Vector RDD that [Int Int Int] . I am trying to convert this into a dataframe using this code
import sqlContext.implicits._
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.ArrayData
vectrdd belongs to the type org.apache.spark.mllib.linalg.Vector
val vectarr = vectrdd.toArray()
case class RFM(Recency: Integer, Frequency: Integer, Monetary: Integer)
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()
I am getting the following error
warning: fruitless type test: a value of type
org.apache.spark.mllib.linalg.Vector cannot also be a Array[T]
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()
error: pattern type is incompatible with expected type;
found : Array[T]
required: org.apache.spark.mllib.linalg.Vector
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()
The second method i tried is this
val vectarr=vectrdd.toArray().take(2)
case class RFM(Recency: String, Frequency: String, Monetary: String)
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()
I got this error
error: constructor cannot be instantiated to expected type;
found : (T1, T2, T3)
required: org.apache.spark.mllib.linalg.Vector
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()
I used this example as a guide >> Convert RDD to Dataframe in Spark/Scala
vectarr
will have type ofArray[org.apache.spark.mllib.linalg.Vector]
, so in the pattern matching you cannot matchArray(p0, p1, p2)
because what is being matched is a Vector, not Array.Also, you should not do
val vectarr = vectrdd.toArray()
- this will convert the RDD to Array and then the final call totoDF
will not work, sincetoDF
only works on RDD's.The correct line would be (provided you change
RFM
to have Doubles)or, equivalently, replace
val vectarr = vectrdd.toArray()
(which producesArray[Vector]
) withval arrayRDD = vectrdd.map(_.toArray())
(producingRDD[Array[Double]]
)