Using column name as a new attribute in pandas

2019-01-26 17:03发布

问题:

I have the following data structure

Date         Agric  Food 
01/01/1990    1.3   0.9  
01/02/1990    1.2   0.9 

I would like to covert it into the format

Date        Sector  Beta
01/01/1990  Agric   1.3
01/02/1990  Agric   1.2
01/01/1990  Food    0.9
01/02/1990  Food    0.9

while I am sure I can do this in a complicated way, is there a way of doing this in a few line of code?

回答1:

Using pd.DataFrame.melt

df.melt('Date', var_name='Sector', value_name='Beta')

         Date Sector  Beta
0  01/01/1990  Agric   1.3
1  01/02/1990  Agric   1.2
2  01/01/1990   Food   0.9
3  01/02/1990   Food   0.9


回答2:

Use set_index and stack:

df.set_index('Date').rename_axis('Sector',axis=1).stack()\
  .reset_index(name='Beta')

Output:

         Date Sector  Beta
0  01/01/1990  Agric   1.3
1  01/01/1990   Food   0.9
2  01/02/1990  Agric   1.2
3  01/02/1990   Food   0.9


回答3:

Or you can using lreshape

df=pd.lreshape(df2, {'Date': ["Date","Date"], 'Beta': ['Agric', 'Food']})
df['Sector']=sorted(df2.columns.tolist()[1:3]*2)

   Out[654]: 
         Date  Beta Sector
0  01/01/1990   1.3  Agric
1  01/02/1990   1.2  Agric
2  01/01/1990   0.9   Food
3  01/02/1990   0.9   Food

In case you have 48 columns

df=pd.lreshape(df2, {'Date':['Date']*2, 'Beta': df2.columns.tolist()[1:3]})
df['Sector']=sorted(df2.columns.tolist()[1:3]*2)

also for the columns Sector , it is more safety create it by

import itertools
list(itertools.chain.from_iterable(itertools.repeat(x, 2) for x in df2.columns.tolist()[1:3]))

EDIT Cause lreshap is undocumented (As per@ Ted Petrou It's best to use available DataFrame methods if possible and then if none available use documented functions. pandas is constantly looking to improve its API and calling undocumented, old and experimental functions like lreshape for anything is unwarranted. Furthermore, this problem is a very straightforward usecase for melt or stack. It is a bad precedent to set for those new to pandas to come to Stack Overflow and find upvoted answers with lreshape. )

Also , if you want to know more about this , you can check it at github

Below are the method by using pd.wide_to_long

dict1 = {'Agric':'A_Agric','Food':'A_Food'}
df2 = df.rename(columns=dict1)
pd.wide_to_long(df2.reset_index(),['A'],i='Date',j='Sector',sep='_',suffix='.').reset_index().drop('index',axis=1).rename(columns={'A':'Beta '})

Out[2149]: 
         Date Sector  Beta 
0  01/01/1990  Agric    1.3
1  01/02/1990  Agric    1.2
2  01/01/1990   Food    0.9
3  01/02/1990   Food    0.9