Python Pandas: How to replace values in a Datafram

2019-07-13 12:40发布

问题:

I have a DataFrame as follows. Both columns have Member_ID which indicates which Member_ID connected with other Member_ID

   col1  col2
    1     3
    1     4
    1     5
    2     3
    2     4
    3     1
    3     2
    3     5
    4     1
    4     2
    5     1
    5     3
    
and I have calculated each Member_ID connected with how many Member_ID. For example Member_ID 1 is connected with 3 Member_ID. If an Member_ID contains more or equal to 3 connections we have to put "a" in front of the Member_Id else we have to put "b" so the label we have to give the label as "a1" for Member_ID 1. Likewise I have calculated the labels for each Member_Id and the label array is below.
 member_ID   No_of_con  Label
    1          3         a1
    2          2         b2
    3          3         a3
    4          2         b4
    5          2         b5
    

Now I have to replace the first Dataframe's values referring from the label array. Dataframe is big for using for loops is not efficient So how can i achive this using Pandas in simpler way? I'm expecting the result as below

    col1     col2
    a1         a3
    a1         b4
    a1         b5
    b2         a3
    b2         b4
    a3         a1
    a3         b2
    a3         b5
    b4         a1
    b4         b2
    b5         a1
    b5         a3

回答1:

we can stack, map and unstack:

In [9]: d1.stack().map(d2.set_index('member_ID')['Label']).unstack()
Out[9]:
   col1 col2
0    a1   a3
1    a1   b4
2    a1   b5
3    b2   a3
4    b2   b4
5    a3   a1
6    a3   b2
7    a3   b5
8    b4   a1
9    b4   b2
10   b5   a1
11   b5   a3


回答2:

Or you can try this

df2.set_index('member_ID',inplace=True)
df1.apply(lambda x: x.map(df2['Label']))


   col1 col2
0    a1   a3
1    a1   b4
2    a1   b5
3    b2   a3
4    b2   b4
5    a3   a1
6    a3   b2
7    a3   b5
8    b4   a1
9    b4   b2
10   b5   a1
11   b5   a3


回答3:

You can use pd.DataFrame.replace using a pd.Series in a dictionary context.

d1.replace(d2.set_index('member_ID').Label)

   col1 col2
0    a1   a3
1    a1   b4
2    a1   b5
3    b2   a3
4    b2   b4
5    a3   a1
6    a3   b2
7    a3   b5
8    b4   a1
9    b4   b2
10   b5   a1
11   b5   a3