Reshape Long Format Multivalue Dataframes with Pan

2019-04-01 23:17发布

问题:

I would like to turn:

DateTime                     ColumnName        Min      Avg      Max                                                                                      
2012-10-14 11:29:23.810000   Percent_Used       24       24       24
2012-10-14 11:29:23.810000   Current_Count  254503   254503   254503
2012-10-14 11:29:23.810000   Max           1048576  1048576  1048576
2012-10-14 11:34:23.813000   Percent_Used       24       24       24
2012-10-14 11:34:23.813000   Current_Count  254116   254116   254116
2012-10-14 11:34:23.813000   Max           1048576  1048576  1048576

Into a dataframe where the the DateTimes are unique (an index) and the columns are:

DataTime, Percent_Used_Min, Percent_Used_Avg, Percent_Used_Max, Current_Count_Min, Current_Count_Avg, Current_Count_Max, Max_Min, Max_Avg, Max_Max

Basically, I want to mimic R's melt/cast without getting into hierarchical indexing or stacked dataframes. I can't seem to to get exactly the above playing with stack/unstack, melt, or pivot/pivot_table -- Is there a good way to do this?

As An example, in R it would be something like:

dynamic_melt = melt(dynamic, id = c("DateTime", "ColumnName"))
recast = data.frame(cast(dynamic_melt, DateTime ~ ...))

The above data will be variable (i.e. the values of ColumnName won't always be the same thing, there might be more or less of them, and different names).

回答1:

There is a melt in pandas.core.reshape:

In [52]: melted = reshape.melt(df, id_vars=['DateTime', 'ColumnName'])

In [53]: melted.set_index(['DateTime', 'ColumnName', 'variable']).value.unstack([1, 2])
Out[53]: 
ColumnName                  Percent_Used  Current_Count      Max  Percent_Used  Current_Count      Max  Percent_Used  Current_Count      Max
variable                             Min            Min      Min           Avg            Avg      Avg           Max            Max      Max
DateTime                                                                                                                                    
2012-10-14 11:29:23.810000            24         254503  1048576            24         254503  1048576            24         254503  1048576
2012-10-14 11:34:23.813000            24         254116  1048576            24         254116  1048576            24         254116  1048576

The columns end up being a MultiIndex, but if that's a deal breaker for you just concat the names and make it a regular Index.