How to get RDD[List[String]] to String and split i

2019-09-18 07:11发布

问题:

I have below scenario , when I need to get the lines from list and split it.

scala> var nonErroniousBidsMap = rawBids.filter(line => !(line(2).contains("ERROR_") || line(5) == null || line(5) == ""))
nonErroniousBidsMap: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[108] at filter at <console>:33

scala> nonErroniousBidsMap.take(2).foreach(println)
List(0000002, 15-04-08-2016, 0.89, 0.92, 1.32, 2.07, , 1.35)
List(0000002, 11-05-08-2016, 0.92, 1.68, 0.81, 0.68, 1.59, , 1.63, 1.77, 2.06, 0.66, 1.53, , 0.32, 0.88, 0.83, 1.01)

scala> val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + "," + rec.split(",")(1) + ",US" + "," + rec.split(",")(5) ) )
<console>:35: error: value split is not a member of List[String]
     val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + "," + rec.split(",")(1) + ",US" + "," + rec.split(",")(5) ) )
                                                              ^

I am getting an error as showed above. Can you please help me how to solve this ?

Thank you.

回答1:

The type of rec is List[String] - which does not have a split(String) method (as the compiler correctly warns). It looks like you're assuming your records are comma-separated Strings, but in fact they're not (when you call println on each one of them, they are printed with comma separators simply because that's how List.toString behaves).

You can simply remove all the calls to split(",") and get what you want:

nonErroniousBidsMap.map(rec => rec.head + "," + rec(1) + ",US" + "," + rec(5))

Or even more elegantly, using Scala's String Interpolation:

nonErroniousBidsMap.map(rec => s"${rec.head},${rec(1)},US,${rec(5)}")