I have the following data structure
Date Agric Food
01/01/1990 1.3 0.9
01/02/1990 1.2 0.9
I would like to covert it into the format
Date Sector Beta
01/01/1990 Agric 1.3
01/02/1990 Agric 1.2
01/01/1990 Food 0.9
01/02/1990 Food 0.9
while I am sure I can do this in a complicated way, is there a way of doing this in a few line of code?
Using pd.DataFrame.melt
df.melt('Date', var_name='Sector', value_name='Beta')
Date Sector Beta
0 01/01/1990 Agric 1.3
1 01/02/1990 Agric 1.2
2 01/01/1990 Food 0.9
3 01/02/1990 Food 0.9
Use set_index
and stack
:
df.set_index('Date').rename_axis('Sector',axis=1).stack()\
.reset_index(name='Beta')
Output:
Date Sector Beta
0 01/01/1990 Agric 1.3
1 01/01/1990 Food 0.9
2 01/02/1990 Agric 1.2
3 01/02/1990 Food 0.9
Or you can using lreshape
df=pd.lreshape(df2, {'Date': ["Date","Date"], 'Beta': ['Agric', 'Food']})
df['Sector']=sorted(df2.columns.tolist()[1:3]*2)
Out[654]:
Date Beta Sector
0 01/01/1990 1.3 Agric
1 01/02/1990 1.2 Agric
2 01/01/1990 0.9 Food
3 01/02/1990 0.9 Food
In case you have 48 columns
df=pd.lreshape(df2, {'Date':['Date']*2, 'Beta': df2.columns.tolist()[1:3]})
df['Sector']=sorted(df2.columns.tolist()[1:3]*2)
also for the columns Sector , it is more safety create it by
import itertools
list(itertools.chain.from_iterable(itertools.repeat(x, 2) for x in df2.columns.tolist()[1:3]))
EDIT Cause lreshap
is undocumented (As per@ Ted Petrou It's best to use available DataFrame methods if possible and then if none available use documented functions. pandas is constantly looking to improve its API and calling undocumented, old and experimental functions like lreshape for anything is unwarranted. Furthermore, this problem is a very straightforward usecase for melt or stack. It is a bad precedent to set for those new to pandas to come to Stack Overflow and find upvoted answers with lreshape. )
Also , if you want to know more about this , you can check it at github
Below are the method by using pd.wide_to_long
dict1 = {'Agric':'A_Agric','Food':'A_Food'}
df2 = df.rename(columns=dict1)
pd.wide_to_long(df2.reset_index(),['A'],i='Date',j='Sector',sep='_',suffix='.').reset_index().drop('index',axis=1).rename(columns={'A':'Beta '})
Out[2149]:
Date Sector Beta
0 01/01/1990 Agric 1.3
1 01/02/1990 Agric 1.2
2 01/01/1990 Food 0.9
3 01/02/1990 Food 0.9