可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have to join two dataframes, which is very similar to the task given here Joining two DataFrames in Spark SQL and selecting columns of only one

However, I want to select only the second column from df2. In my task, I am going to use the join function for two dataframes within a reduce function for a list of dataframes. In this list of dataframes, the column names will be different. However, in each case I would want to keep the second column of df2.

I did not find anywhere how to select a dataframe's column by their numbered index. Any help is appreciated!

EDIT:

ANSWER

I figured out the solution. Here is one way to do this:

def joinDFs(df1: DataFrame, df2: DataFrame): DataFrame = {
  val df2cols = df2.columns
  val desiredDf2Col = df2cols(1)  // the second column
  val df3 = df1.as("df1").join(df2.as("df2"), $"df1.time" === $"df2.time")
      .select($"df1.*",$"df2.$desiredDf2Col")
  df3
}

And then I can apply this function in a reduce operation on a list of dataframes.

var listOfDFs: List[DataFrame] = List()
// Populate listOfDFs as you want here
val joinedDF = listOfDFs.reduceLeft((x, y) => {joinDFs(x, y)})

回答1:

To select the second column in your dataframe you can simply do:

val df3 = df2.select(df2.columns(1))

This will first find the second column name and then select it.

回答2:

If the join and select methods that you want to define in reduce function is similar to Joining two DataFrames in Spark SQL and selecting columns of only one
Then you should do the following :

import org.apache.spark.sql.functions._
d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id").select(Seq(1) map d2.columns map col: _*)

You will have to remember that the name of the second column i.e. Seq(1) should not be same as any of the dataframes column names.
You can select multiple columns as well but remember the bold note above

import org.apache.spark.sql.functions._
d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id").select(Seq(1, 2) map d2.columns map col: _*)

How to join two dataframes in Scala and select on

问题:

回答1:

回答2:

收藏的人(0)

How to join two dataframes in Scala and select on

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮