I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).
I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.
I am familiar with R and this is the code I would use to do this in R.
library("dplyr")
## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")
## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)
How would I achieve this in python?
Consider the following dataframes
This is one way to do what you want
Method 1
Method 2
Timing
4 rows with 2 overlap
Method 1 is much quicker
10,000 rows 5,000 overlap
loops are bad
I had the same problem. This answer using
how='outer'
andindicator=True
of merge inspired me to come up with this solution:Which prints this output:
Easiest answer imaginable:
Should be the fastest proposed as well.
You'll have both tables
TableA
andTableB
such that bothDataFrame
objects have columns with unique values in their respective tables, but some columns may have values that occur simultaneously (have the same values for a row) in both tables.Then, we want to merge the rows in
TableA
with the rows inTableB
that don't match any inTableA
for a 'Key' column. The concept is to picture it as comparing two series of variable length, and combining the rows in one seriessA
with the othersB
ifsB
's values don't matchsA
's. The following code solves this exercise:Note this affects
TableB
's data as well. You can useinplace=False
and re-assign it to anewTable
, thenTableA.append(newTable)
alternatively.