I have a CSV file with one single column and the rows are defined as follows :
123 || food || fruit
123 || food || fruit || orange
123 || food || fruit || apple
I want to create a csv file with a single column and distinct row values as :
orange
apple
I tried using the following code :
val data = sc.textFile("fruits.csv")
val rows = data.map(_.split("||"))
val rddnew = rows.flatMap( arr => {
val text = arr(0)
val words = text.split("||")
words.map( word => ( word, text ) )
} )
But this code is not giving me the correct result as wanted.
Can anyone please help me with this ?
you can solve this problem similar to this code
you need to split with escape for special characters, since split takes regex
converting to CSV is tricky because data strings may potentially contain your delimiter (in quotes), new-line or other parse-sensitive characters, so I'd recommend using spark-csv
and