I have a dataframe that contains string columns and I am planning to use it as input for k-means using spark and scala. I am converting my string typed columns of the dataframe using the method below:
val toDouble = udf[Double, String]( _.toDouble)
val analysisData = dataframe_mysql.withColumn("Event", toDouble(dataframe_mysql("event"))).withColumn("Execution", toDouble(dataframe_mysql("execution"))).withColumn("Info", toDouble(dataframe_mysql("info")))
val assembler = new VectorAssembler()
.setInputCols(Array("execution", "event", "info"))
.setOutputCol("features")
val output = assembler.transform(analysisData)
println(output.select("features", "execution").first())
when I print the analysisData schema the convertion is correct. but I am getting an exception: VectorAssembler does not support the StringType type which means that my values are still strings! how can I convert the values and not only the schema type?
thanks