vlookup in Pandas using join

2019-01-04 07:25发布

I have the following 2 dataframes

Example1
sku loc flag  
122  61 True 
123  61 True
113  62 True 
122  62 True 
123  62 False
122  63 False
301  63 True 

Example2 
sku dept 
113 a
122 b
123 b
301 c 

I want to perform a merge, or join opertation using Pandas (or whichever Python operator is best) to produce the below data frame.

Example3
sku loc flag   dept  
122  61 True   b
123  61 True   b
113  62 True   a
122  62 True   b
123  62 False  b
122  63 False  b
301  63 True   c

Both 
df_Example1.join(df_Example2,lsuffix='_ProdHier')
df_Example1.join(df_Example2,how='outer',lsuffix='_ProdHier')

Aren't working. What am I doing wrong?

3条回答
乱世女痞
2楼-- · 2019-01-04 08:21

Perform a left merge, this will use sku column as the column to join on:

In [26]:

df.merge(df1, on='sku', how='left')
Out[26]:
   sku  loc   flag dept
0  122   61   True    b
1  122   62   True    b
2  122   63  False    b
3  123   61   True    b
4  123   62  False    b
5  113   62   True    a
6  301   63   True    c

If sku is in fact your index then do this:

In [28]:

df.merge(df1, left_index=True, right_index=True, how='left')
Out[28]:
     loc   flag dept
sku                 
113   62   True    a
122   61   True    b
122   62   True    b
122   63  False    b
123   61   True    b
123   62  False    b
301   63   True    c

Another method is to use map, if you set sku as the index on your second df, so in effect it becomes a Series then the code simplifies to this:

In [19]:

df['dept']=df.sku.map(df1.dept)
df
Out[19]:
   sku  loc   flag dept
0  122   61   True    b
1  123   61   True    b
2  113   62   True    a
3  122   62   True    b
4  123   62  False    b
5  122   63  False    b
6  301   63   True    c
查看更多
仙女界的扛把子
3楼-- · 2019-01-04 08:21

A more generic application would be to use apply and lambda as follows:

dict1 = {113:'a',
         122:'b',
         123:'b',
         301:'c'}

df = pd.DataFrame([['1', 113],
                   ['2', 113],
                   ['3', 301],
                   ['4', 122],
                   ['5', 113]], columns=['num', 'num_letter'])

Add as a new dataframe column

 **df['letter'] = df['num_letter'].apply(lambda x: dict1[x])**

  num  num_letter letter
0   1         113      a
1   2         113      a
2   3         301      c
3   4         122      b
4   5         113      a

OR replace the existing ('num_letter') column

 **df['num_letter'] = df['num_letter'].apply(lambda x: dict1[x])**

  num num_letter
0   1          a
1   2          a
2   3          c
3   4          b
4   5          a
查看更多
男人必须洒脱
4楼-- · 2019-01-04 08:29

VLoopup in VBA is just like pandas.dataframe.merge

I always look for so many procedures for VBA in the past and now python dataframe saves me a ton of work, good thing is I don't need write a vlookup method.

pandas.DataFrame.merge

>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  foo   1        foo   5
1  foo   4        foo   5
2  bar   2        bar   6
3  bar   2        bar   8
4  baz   3        NaN   NaN
5  NaN   NaN      qux   7

You can also try the following to do a left merge.

import pandas as pd
pd.merge(left, right, left_on = 'key', right_on = 'key', how='left')

outer or left act like SQL, python's built-in class DataFrame has the method merge taking many args, which is very detailed and handy.

查看更多
登录 后发表回答