I am experiencing a very strange behaviour from VectorAssembler
and I was wondering if anyone else has seen this.
My scenario is pretty straightforward. I parse data from a CSV
file where I have some standard Int
and Double
fields and I also calculate some extra columns. My parsing function returns this:
val joined = countPerChannel ++ countPerSource //two arrays of Doubles joined
(label, orderNo, pageNo, Vectors.dense(joinedCounts))
My main function uses the parsing function like this:
val parsedData = rawData.filter(row => row != header).map(parseLine)
val data = sqlContext.createDataFrame(parsedData).toDF("label", "orderNo", "pageNo","joinedCounts")
I then use a VectorAssembler
like this:
val assembler = new VectorAssembler()
.setInputCols(Array("orderNo", "pageNo", "joinedCounts"))
.setOutputCol("features")
val assemblerData = assembler.transform(data)
So when I print a row of my data before it goes into the VectorAssembler
it looks like this:
[3.2,17.0,15.0,[0.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,2.0]]
After the transform function of VectorAssembler I print the same row of data and get this:
[3.2,(18,[0,1,6,9,14,17],[17.0,15.0,3.0,1.0,4.0,2.0])]
What on earth is going on? What has the VectorAssembler
done? I 've double checked all the calculations and even followed the simple Spark examples and cannot see what is wrong with my code. Can you?