What does axis in pandas mean?

2019-01-01 12:04发布

Here is my code to generate a dataframe:

import pandas as pd
import numpy as np

dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))

then I got the dataframe:

+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      0     | 0.626386| 1.52325|
+------------+---------+--------+

When I type the commmand :

dff.mean(axis=1)

I got :

0    1.074821
dtype: float64

According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be

A    0.626386
B    1.523255
dtype: float64

So here is my question: what does axis in pandas mean?

14条回答
路过你的时光
2楼-- · 2019-01-01 12:08

This is based on @Safak's answer. The best way to understand the axes in pandas/numpy is to create a 3d array and check the result of the sum function along the 3 different axes.

 a = np.ones((3,5,7))

a will be:

    array([[[1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.]],

   [[1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.]],

   [[1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1., 1.]]])

Now check out the sum of elements of the array along each of the axes:

 x0 = np.sum(a,axis=0)
 x1 = np.sum(a,axis=1)
 x2 = np.sum(a,axis=2)

will give you the following results:

   x0 :
   array([[3., 3., 3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3., 3., 3.]])

   x1 : 
   array([[5., 5., 5., 5., 5., 5., 5.],
   [5., 5., 5., 5., 5., 5., 5.],
   [5., 5., 5., 5., 5., 5., 5.]])

  x2 :
   array([[7., 7., 7., 7., 7.],
        [7., 7., 7., 7., 7.],
        [7., 7., 7., 7., 7.]])
查看更多
谁念西风独自凉
3楼-- · 2019-01-01 12:08

My thinking : Axis = n, where n = 0, 1, etc. means that the matrix is collapsed (folded) along that axis. So in a 2D matrix, when you collapse along 0 (rows), you are really operating on one column at a time. Similarly for higher order matrices.

This is not the same as the normal reference to a dimension in a matrix, where 0 -> row and 1 -> column. Similarly for other dimensions in an N dimension array.

查看更多
孤独寂梦人
4楼-- · 2019-01-01 12:10

Axis in view of programming is the position in the shape tuple. Here is an example:

import numpy as np

a=np.arange(120).reshape(2,3,4,5)

a.shape
Out[3]: (2, 3, 4, 5)

np.sum(a,axis=0).shape
Out[4]: (3, 4, 5)

np.sum(a,axis=1).shape
Out[5]: (2, 4, 5)

np.sum(a,axis=2).shape
Out[6]: (2, 3, 5)

np.sum(a,axis=3).shape
Out[7]: (2, 3, 4)

Mean on the axis will cause that dimension to be removed.

Referring to the original question, the dff shape is (1,2). Using axis=1 will change the shape to (1,).

查看更多
弹指情弦暗扣
5楼-- · 2019-01-01 12:10

It means it took the mean based using each column, axis=0 would give you what you think, but axis=1 gives

 (0.626386+1.52325)/2
 1.075
查看更多
只靠听说
6楼-- · 2019-01-01 12:13

axis = 0 means up to down axis = 1 means left to right

sums[key] = lang_sets[key].iloc[:,1:].sum(axis=0)

Given example is taking sum of all the data in column == key.

查看更多
大哥的爱人
7楼-- · 2019-01-01 12:13

Arrays are designed with so-called axis=0 and rows positioned vertically versus axis=1 and columns positioned horizontally. Axis refers to the dimension of the array. Illustration

查看更多
登录 后发表回答