Melt a pandas DataFrame

2019-05-11 22:50发布

I have a pandas DataFrame like this:

df = pd.DataFrame({'custid':[1,2,3,4],
...: 'prod1':['jeans','tshirt','jacket','tshirt'],
...: 'prod1_hnode1':[1,2,3,2],
...: 'prod1_hnode2':[6,7,8,7],
...: 'prod2':['tshirt','jeans','jacket','shirt'],
...: 'prod2_hnode1':[2,1,3,4],
...: 'prod2_hnode2':[7,6,8,7]})

In [54]: df
Out[54]: 
    custid   prod1  prod1_hnode1  prod1_hnode2   prod2  prod2_hnode1  \
0       1   jeans             1             6  tshirt             2   
1       2  tshirt             2             7   jeans             1   
2       3  jacket             3             8  jacket             3   
3       4  tshirt             2             7   shirt             4   

   prod2_hnode2  
0             7  
1             6  
2             8  
3             7  

How can I convert this to the following format:

dfnew = pd.DataFrame({'custid':[1,1,2,2,3,3,4,4],
...: 'prod':['prod1','prod2','prod1','prod2','prod1','prod2','prod1','prod2'],
...: 'rec':['jeans','tshirt','tshirt','jeans','jacket','jacket','tshirt','shirt'],
...: 'hnode1':[1,2,2,1,3,3,2,4],
...: 'hnode2':[6,7,7,6,8,8,7,7]})


In [56]: dfnew
Out[56]: 
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt

2条回答
Viruses.
2楼-- · 2019-05-11 23:27

Use:

  • set_index by column custid
  • create MultiIndex in columns by split
  • replace NaNs in columns by rec
  • stack by first level
  • reset_index for columns from MultiIndex
  • rename column

df = df.set_index('custid')
df.columns = df.columns.str.split('_', expand=True)
df = df.rename(columns={np.nan:'rec'})
cols = ['custid','hnode1','hnode2','prod','rec']
df = df.stack(0).reset_index().rename(columns={'level_1':'prod'}).reindex(columns=cols)
print (df)
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt
查看更多
beautiful°
3楼-- · 2019-05-11 23:36

Here's another approach that should work, but uses repeated melts.

coln = df.dtypes.index  # save some typing
df_long = pd.melt(
    df, id_vars = "custid", value_vars = ["prod1", "prod2"],
    var_name = "prod", value_name = "rec").assign(
    hnode1 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode1" in x, coln))["value"],
    hnode2 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode2" in x, coln))["value"])
print(df_long)
   custid   prod     rec  hnode1  hnode2
0       1  prod1   jeans       1       6
1       2  prod1  tshirt       2       7
2       3  prod1  jacket       3       8
3       4  prod1  tshirt       2       7
4       1  prod2  tshirt       2       7
5       2  prod2   jeans       1       6
6       3  prod2  jacket       3       8
7       4  prod2   shirt       4       7

You mention R in your comments. melt from "data.table" should be able to handle this much more easily since you can melt multiple sets of columns at once, similar to how you might approach the problem with base R's reshape function.

The base R approach might be something along the lines of:

reshape(df, direction = "long", idvar = "custid", 
        varying = list(c(2, 5), c(3, 6), c(4, 7)), 
        sep = "", times = c("prod1", "prod2"))
查看更多
登录 后发表回答