Melt a pandas DataFrame

I have a pandas DataFrame like this:

df = pd.DataFrame({'custid':[1,2,3,4],
...: 'prod1':['jeans','tshirt','jacket','tshirt'],
...: 'prod1_hnode1':[1,2,3,2],
...: 'prod1_hnode2':[6,7,8,7],
...: 'prod2':['tshirt','jeans','jacket','shirt'],
...: 'prod2_hnode1':[2,1,3,4],
...: 'prod2_hnode2':[7,6,8,7]})

In [54]: df
Out[54]: 
    custid   prod1  prod1_hnode1  prod1_hnode2   prod2  prod2_hnode1  \
0       1   jeans             1             6  tshirt             2   
1       2  tshirt             2             7   jeans             1   
2       3  jacket             3             8  jacket             3   
3       4  tshirt             2             7   shirt             4   

   prod2_hnode2  
0             7  
1             6  
2             8  
3             7

How can I convert this to the following format:

dfnew = pd.DataFrame({'custid':[1,1,2,2,3,3,4,4],
...: 'prod':['prod1','prod2','prod1','prod2','prod1','prod2','prod1','prod2'],
...: 'rec':['jeans','tshirt','tshirt','jeans','jacket','jacket','tshirt','shirt'],
...: 'hnode1':[1,2,2,1,3,3,2,4],
...: 'hnode2':[6,7,7,6,8,8,7,7]})


In [56]: dfnew
Out[56]: 
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt

标签： python pandas reshape melt

2条回答

Viruses.

2楼-- · 2019-05-11 23:27

Use:

set_index by column custid
create MultiIndex in columns by split
replace NaNs in columns by rec
stack by first level
reset_index for columns from MultiIndex
rename column

df = df.set_index('custid')
df.columns = df.columns.str.split('_', expand=True)
df = df.rename(columns={np.nan:'rec'})
cols = ['custid','hnode1','hnode2','prod','rec']
df = df.stack(0).reset_index().rename(columns={'level_1':'prod'}).reindex(columns=cols)
print (df)
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt

0人赞添加讨论(0) 举报

beautiful°

3楼-- · 2019-05-11 23:36

Here's another approach that should work, but uses repeated melts.

coln = df.dtypes.index  # save some typing
df_long = pd.melt(
    df, id_vars = "custid", value_vars = ["prod1", "prod2"],
    var_name = "prod", value_name = "rec").assign(
    hnode1 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode1" in x, coln))["value"],
    hnode2 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode2" in x, coln))["value"])
print(df_long)
   custid   prod     rec  hnode1  hnode2
0       1  prod1   jeans       1       6
1       2  prod1  tshirt       2       7
2       3  prod1  jacket       3       8
3       4  prod1  tshirt       2       7
4       1  prod2  tshirt       2       7
5       2  prod2   jeans       1       6
6       3  prod2  jacket       3       8
7       4  prod2   shirt       4       7

You mention R in your comments. melt from "data.table" should be able to handle this much more easily since you can melt multiple sets of columns at once, similar to how you might approach the problem with base R's reshape function.

The base R approach might be something along the lines of:

reshape(df, direction = "long", idvar = "custid", 
        varying = list(c(2, 5), c(3, 6), c(4, 7)), 
        sep = "", times = c("prod1", "prod2"))

0人赞添加讨论(0) 举报

Melt a pandas DataFrame

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间