Here is my code to generate a dataframe:
import pandas as pd
import numpy as np
dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))
then I got the dataframe:
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|
+------------+---------+--------+
When I type the commmand :
dff.mean(axis=1)
I got :
0 1.074821
dtype: float64
According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be
A 0.626386
B 1.523255
dtype: float64
So here is my question: what does axis in pandas mean?
This is based on @Safak's answer. The best way to understand the axes in pandas/numpy is to create a 3d array and check the result of the sum function along the 3 different axes.
a will be:
Now check out the sum of elements of the array along each of the axes:
will give you the following results:
My thinking : Axis = n, where n = 0, 1, etc. means that the matrix is collapsed (folded) along that axis. So in a 2D matrix, when you collapse along 0 (rows), you are really operating on one column at a time. Similarly for higher order matrices.
This is not the same as the normal reference to a dimension in a matrix, where 0 -> row and 1 -> column. Similarly for other dimensions in an N dimension array.
Axis in view of programming is the position in the shape tuple. Here is an example:
Mean on the axis will cause that dimension to be removed.
Referring to the original question, the dff shape is (1,2). Using axis=1 will change the shape to (1,).
It means it took the mean based using each column, axis=0 would give you what you think, but axis=1 gives
axis = 0 means up to down axis = 1 means left to right
Given example is taking sum of all the data in column == key.
Arrays are designed with so-called axis=0 and rows positioned vertically versus axis=1 and columns positioned horizontally. Axis refers to the dimension of the array.