How to filter duplicate records having multiple ke

2020-02-15 07:17发布

问题:

I have two dataframes. I want to delete some records in Data Frame-A based on some common column values in Data Frame-B.

For Example: Data Frame-A:

A B C D
1 2 3 4
3 4 5 7
4 7 9 6
2 5 7 9


Data Frame-B:

A B C D
1 2 3 7
2 5 7 4
2 9 8 7


Keys: A,B,C columns

Desired Output:

A B C D
3 4 5 7
4 7 9 6

Any solution for this.

回答1:

You are looking for left anti-join:

df_a.join(df_b, Seq("A","B","C"), "leftanti").show()
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  3|  4|  5|  7|
|  4|  7|  9|  6|
+---+---+---+---+