I have a CSV file with below data :
1,2,5
2,4
2,3
I want to load them into a Dataframe having schema of string of array
The output should be like below.
[1, 2, 5]
[2, 4]
[2, 3]
This has been answered using scala here: Spark: Convert column of string to an array
I want to make it happen in Java.
Please help
Below is the sample code in Java. You need to read your file using
spark.read().text(String path)
method and then call thesplit
function.you can use VectorAssembler class to create as array of features, which is particulary useful with pipelines:
https://spark.apache.org/docs/2.2.0/ml-features.html#vectorassembler