spark - scala: not a member of org.apache.spark.sq

2019-06-19 14:01发布

I am trying to convert a data frame to RDD, then perform some operations below to return tuples:

df.rdd.map { t=>
 (t._2 + "_" + t._3 , t)
}.take(5)

Then I got the error below. Anyone have any ideas? Thanks!

<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
               (t._2 + "_" + t._3 , t)
                  ^

标签： scala apache-spark apache-spark-sql rdd spark-dataframe

2条回答

叛逆

2楼-- · 2019-06-19 14:42

When you convert a DataFrame to RDD, you get an RDD[Row], so when you use map, your function receives a Row as parameter. Therefore, you must use the Row methods to access its members (note that the index starts from 0):

df.rdd.map { 
  row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)

You can view more examples and check all methods available for Row objects in the Spark scaladoc.

Edit: I don't know the reason why you are doing this operation, but for concatenating String columns of a DataFrame you may consider the following option:

import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2019-06-19 14:52

You can access every element of a Row like if it was a List or Array, it means using (index), however you can use the method get also.

For example:

df.rdd.map {t =>
  (t(2).toString + "_" + t(3).toString, t)
}.take(5)

0人赞添加讨论(0) 举报

spark - scala: not a member of org.apache.spark.sq

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间