I have two DataFrames . . .
df1
is a table I need to pull values from using index, column pairs retrieved from multiple columns in df2.
I see there is a function get_value
which works perfectly when given an index and column value, but when trying to vectorize this function to create a new column I am failing...
df1 = pd.DataFrame(np.arange(20).reshape((4, 5)))
df1.columns = list('abcde')
df1.index = ['cat', 'dog', 'fish', 'bird']
a b c d e
cat 0 1 2 3 4
dog 5 6 7 8 9
fish 10 11 12 13 14
bird 15 16 17 18 19
df1.get_value('bird, 'c')
17
Now what I need to do is to create an entire new column on df2
-- when indexing df1
based on index, column pairs from the animal
, letter
columns specified in df2
effectively vectorizing the pd.get_value
function above.
df2 = pd.DataFrame(np.arange(20).reshape((4, 5)))
df2['animal'] = ['cat', 'dog', 'fish', 'bird']
df2['letter'] = list('abcd')
0 1 2 3 4 animal letter
0 0 1 2 3 4 cat a
1 5 6 7 8 9 dog b
2 10 11 12 13 14 fish c
3 15 16 17 18 19 bird d
resulting in . . .
0 1 2 3 4 animal letter looked_up
0 0 1 2 3 4 cat a 0
1 5 6 7 8 9 dog b 6
2 10 11 12 13 14 fish c 12
3 15 16 17 18 19 bird d 18
If looking for a bit faster approach then zip will help in case of small dataframe i.e
Output:
As John suggested you can simplify the code which will be much faster.
In case of missing data use if else i.e
For small dataframes
For large dataframe
There's a function aptly named
lookup
that does exactly this.lookup
andget_value
are great answers if your values exist in lookup dataframe.However, if you've (row, column) pairs not present in the lookup dataframe, and want the lookup value be
NaN
--merge
andstack
is one way to do itTest with adding non-existing (animal, letter) pair