This question already has answers here:
Closed 2 years ago.
I'm having a little trouble with pivoting in pandas. The dataframe
(dates, location, data) I'm working on looks like:
dates location data
date1 A X
date2 A Y
date3 A Z
date1 B XX
date2 B YY
Basically, I'm trying to pivot on location to end up with a dataframe like:
dates A B C
date1 X XX etc...
date2 Y YY
date3 Z ZZ
Unfortunately when I pivot, the index, which is equivalent to the original dates column, does not change and I get:
dates A B C
date1 X NA etc...
date2 Y NA
date3 Z NA
date1 NA XX
date2 NA YY
Does anyone know how I can fix this issue to get the dataframe formate I'm looking for?
I'm current calling Pivot as such:
df.pivot(index="dates", columns="location")
because I have a # of data columns I want to pivot (don't want to list each one as an argument). I believe by default pivot pivots the rest of the columns in the dataframe.
Thanks.
If you have multiple data columns, calling pivot without the values columns should give you a pivoted frame with a MultiIndex as the columns:
In [3]: df
Out[3]:
columns data1 data2 index
0 a -0.602398 -0.982524 x
1 a 0.880927 0.818551 y
2 b -0.238849 0.766986 z
3 b -1.304346 0.955031 x
4 c -0.094820 0.746046 y
5 c -0.835785 1.123243 z
In [4]: df.pivot('index', 'columns')
Out[4]:
data1 data2
columns a b c a b c
index
x -0.602398 -1.304346 NaN -0.982524 0.955031 NaN
y 0.880927 NaN -0.094820 0.818551 NaN 0.746046
z NaN -0.238849 -0.835785 NaN 0.766986 1.123243
How are you calling DataFrame.pivot and what datatype is your dates column?
Suppose I have a DataFrame that's similar to yours, the dates columns contains datetime objects:
In [52]: df
Out[52]:
data dates loc
0 0.870900 2000-01-01 00:00:00 A
1 0.344999 2000-01-02 00:00:00 A
2 0.001729 2000-01-03 00:00:00 A
3 1.565684 2000-01-01 00:00:00 B
4 -0.851542 2000-01-02 00:00:00 B
In [53]: df.pivot('dates', 'loc', 'data')
Out[53]:
loc A B
dates
2000-01-01 0.870900 1.565684
2000-01-02 0.344999 -0.851542
2000-01-03 0.001729 NaN
Just answered my own question. I was using an old Sybase module to import data and I think it used an old DateTimeType object from mxDatetime. In that module, a datetime of Jan 01 2011 would not necessarily equal another datetime of Jan 01 2011 (e.g. each datetime was unique). Hence the dataframe pivot treated each column value as unique in the index.
Thanks for the help.