This question already has an answer here:
-
How do I compare each column in a table using DataFrame by Scala
2 answers
The question I asked before is as follows.
Last question
Table 1 -- ID pairs table
Table 2 -- Attribute table
Table 3
For example, id1 and id2 have different color and size, so the id1 and id2 row(2nd row in Table 3) has "id1 id2 0 0";
id1 and id3 have same color and different size, so the id1 and id3 row(3nd row in Table 3) has "id1 id3 1 0";
Same attribute---1 Different attribute---0
But, what if I do not know how many attribute columns in Table2; how can I make it? Such as I do not know the column name color or size. Maybe there is another column called brand. Then how can I get Table3?
The following solution should work for any unknown number of attributes in Table2
. I have edited the answer from your Last Question
val t1 = List(
("id1","id2"),
("id1","id3"),
("id2","id3")
).toDF("id_x", "id_y")
val t2 = List(
("id1","blue","m","brand1"),
("id2","red","s","brand1"),
("id3","blue","s","brand2")
).toDF("id", "color", "size", "brand")
val outSchema = t2.columns.tail
var t3 = t1
.join(t2.as("x"), $"id_x" === $"x.id", "inner")
.join(t2.as("y"), $"id_y" === $"y.id", "inner")
for(columnName <- outSchema){
t3 = t3.withColumn(columnName+"s", when(col(s"x.$columnName") === col(s"y.$columnName"),1).otherwise(0))
.drop(columnName)
.drop("id")
.withColumnRenamed(columnName+"s", columnName)
}
t3.show(false)
The final output is
+----+----+-----+----+-----+
|id_x|id_y|color|size|brand|
+----+----+-----+----+-----+
|id1 |id2 |0 |0 |1 |
|id1 |id3 |1 |0 |0 |
|id2 |id3 |0 |1 |0 |
+----+----+-----+----+-----+
The solution should work for any unknown number of attributes.