What does axis in pandas mean?

2019-01-01 12:04发布

Here is my code to generate a dataframe:

import pandas as pd
import numpy as np

dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))

then I got the dataframe:

+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      0     | 0.626386| 1.52325|
+------------+---------+--------+

When I type the commmand :

dff.mean(axis=1)

I got :

0    1.074821
dtype: float64

According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be

A    0.626386
B    1.523255
dtype: float64

So here is my question: what does axis in pandas mean?

14条回答
怪性笑人.
2楼-- · 2019-01-01 12:15

I'm a newbie to pandas. But this is how I understand axis in pandas:


Axis Constant Varying Direction


0 Column Row Downwards |


1 Row Column Towards Right -->


So to compute mean of a column, that particular column should be constant but the rows under that can change (varying) so it is axis=0.

Similarly, to compute mean of a row, that particular row is constant but it can traverse through different columns (varying), axis=1.

查看更多
路过你的时光
3楼-- · 2019-01-01 12:16

It specifies the axis along which the means are computed. By default axis=0. This is consistent with the numpy.mean usage when axis is specified explicitly (in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0 along the rows (namely, index in pandas), and axis=1 along the columns. For added clarity, one may choose to specify axis='index' (instead of axis=0) or axis='columns' (instead of axis=1).

+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      0     | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
             |         |
             | axis=0  |
             ↓         ↓
查看更多
春风洒进眼中
4楼-- · 2019-01-01 12:21

axis refers to the dimension of the array, in the case of pd.DataFrames axis=0 is the dimension that points downwards and axis=1 the one that points to the right.

Example: Think of an ndarray with shape (3,5,7).

a = np.ones((3,5,7))

a is a 3 dimensional ndarray, i.e. it has 3 axes ("axes" is plural of "axis"). The configuration of a will look like 3 slices of bread where each slice is of dimension 5-by-7. a[0,:,:] will refer to the 0-th slice, a[1,:,:] will refer to the 1-st slice etc.

a.sum(axis=0) will apply sum() along the 0-th axis of a. You will add all the slices and end up with one slice of shape (5,7).

a.sum(axis=0) is equivalent to

b = np.zeros((5,7))
for i in range(5):
    for j in range(7):
        b[i,j] += a[:,i,j].sum()

b and a.sum(axis=0) will both look like this

array([[ 3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.]])

In a pd.DataFrame, axes work the same way as in numpy.arrays: axis=0 will apply sum() or any other reduction function for each column.

N.B. In @zhangxaochen's answer, I find the phrases "along the rows" and "along the columns" slightly confusing. axis=0 should refer to "along each column", and axis=1 "along each row".

查看更多
怪性笑人.
5楼-- · 2019-01-01 12:23

The easiest way for me to understand is to talk about whether you are calculating a statistic for each column (axis = 0) or each row (axis = 1). If you calculate a statistic, say a mean, with axis = 0 you will get that statistic for each column. So if each observation is a row and each variable is in a column, you would get the mean of each variable. If you set axis = 1 then you will calculate your statistic for each row. In our example, you would get the mean for each observation across all of your variables (perhaps you want the average of related measures).

axis = 0: by column = column-wise = along the rows

axis = 1: by row = row-wise = along the columns

查看更多
柔情千种
6楼-- · 2019-01-01 12:23

I understand this way :

Say if your operation requires traversing from left to right/right to left in a dataframe, you are apparently merging columns ie. you are operating on various columns. This is axis =1

Example

df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['A', 'B', 'C', 'D'])
print(df)
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11 

df.mean(axis=1)

0    1.5
1    5.5
2    9.5
dtype: float64

df.drop(['A','B'],axis=1,inplace=True)

    C   D
0   2   3
1   6   7
2  10  11

Point to note here is we are operating on columns

Similarly, if your operation requires traversing from top to bottom/bottom to top in a dataframe, you are merging rows. This is axis=0.

查看更多
ら面具成の殇う
7楼-- · 2019-01-01 12:24

The designer of pandas, Wes McKinney, used to work intensively on finance data. Think of columns as stock names and index as daily prices. You can then guess what the default behavior is (i.e., axis=0) with respect to this finance data. axis=1 can be simply thought as 'the other direction'.

For example, the statistics functions, such as mean(), sum(), describe(), count() all default to column-wise because it makes more sense to do them for each stock. sort_index(by=) also defaults to column. fillna(method='ffill') will fill along column because it is the same stock. dropna() defaults to row because you probably just want to discard the price on that day instead of throw away all prices of that stock.

Similarly, the square brackets indexing refers to the columns since it's more common to pick a stock instead of picking a day.

查看更多
登录 后发表回答