What is the pandas.Panel deprecation warning actua

2020-02-23 09:01发布

问题:

I have a package that uses pandas Panels to generate MultiIndex pandas DataFrames. However, whenever I use pandas.Panel, I get the following DeprecationError:

DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method. Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a .to_xarray() method to help automate this conversion.

However, I can't understand what the first recommendation here is actually recommending in order to create MultiIndex DataFrames. If Panel is going to be removed, how am I going to be able to use Panel.to_frame?


To clarify: I am not asking what deprecation is, or how to convert my Panels to DataFrames. What I am asking is, if I am using pandas.Panel and then pandas.Panel.to_frame in a library to create MultiIndex DataFrames from 3D ndarrays, and Panels are going to be deprecated, then what is the best option for making those DataFrames without using the Panel API?

Eg, if I'm doing the following, with X as a ndarray with shape (N,J,K):

p = pd.Panel(X, items=item_names, major_axis=names0, minor_axis=names1)
df = p.to_frame()

this is clearly no longer a viable future-proof option for DataFrame construction, though it was the recommended method in this question.

回答1:

Consider the following panel:

data = np.random.randint(1, 10, (5, 3, 2))
pnl = pd.Panel(
    data, 
    items=['item {}'.format(i) for i in range(1, 6)], 
    major_axis=[2015, 2016, 2017], 
    minor_axis=['US', 'UK']
)

If you convert this to a DataFrame, this becomes:

             item 1  item 2  item 3  item 4  item 5
major minor                                        
2015  US          9       6       3       2       5
      UK          8       3       7       7       9
2016  US          7       7       8       7       5
      UK          9       1       9       9       1
2017  US          1       8       1       3       1
      UK          6       8       8       1       6

So it takes the major and minor axes as the row MultiIndex, and items as columns. The shape has become (6, 5) which was originally (5, 3, 2). It is up to you where to use the MultiIndex but if you want the exact same shape, you can do the following:

data = data.reshape(5, 6).T
df = pd.DataFrame(
    data=data,
    index=pd.MultiIndex.from_product([[2015, 2016, 2017], ['US', 'UK']]),
    columns=['item {}'.format(i) for i in range(1, 6)]
)

which yields the same DataFrame (use the names parameter of pd.MultiIndex.from_product if you want to name your indices):

         item 1  item 2  item 3  item 4  item 5
2015 US       9       6       3       2       5
     UK       8       3       7       7       9
2016 US       7       7       8       7       5
     UK       9       1       9       9       1
2017 US       1       8       1       3       1
     UK       6       8       8       1       6

Now instead of pnl['item1 1'], you use df['item 1'] (optionally df['item 1'].unstack()); instead of pnl.xs(2015) you use df.xs(2015) and instead of pnl.xs('US', axis='minor'), you use df.xs('US', level=1).

As you see, this is just a matter of reshaping your initial 3D numpy array to 2D. You add the other (artificial) dimension with the help of MultiIndex.