Here is my code to generate a dataframe:
import pandas as pd
import numpy as np
dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))
then I got the dataframe:
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|
+------------+---------+--------+
When I type the commmand :
dff.mean(axis=1)
I got :
0 1.074821
dtype: float64
According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be
A 0.626386
B 1.523255
dtype: float64
So here is my question: what does axis in pandas mean?
I'm a newbie to pandas. But this is how I understand axis in pandas:
Axis Constant Varying Direction
0 Column Row Downwards |
1 Row Column Towards Right -->
So to compute mean of a column, that particular column should be constant but the rows under that can change (varying) so it is axis=0.
Similarly, to compute mean of a row, that particular row is constant but it can traverse through different columns (varying), axis=1.
It specifies the axis along which the means are computed. By default
axis=0
. This is consistent with thenumpy.mean
usage whenaxis
is specified explicitly (innumpy.mean
, axis==None by default, which computes the mean value over the flattened array) , in whichaxis=0
along the rows (namely, index in pandas), andaxis=1
along the columns. For added clarity, one may choose to specifyaxis='index'
(instead ofaxis=0
) oraxis='columns'
(instead ofaxis=1
).axis
refers to the dimension of the array, in the case ofpd.DataFrame
saxis=0
is the dimension that points downwards andaxis=1
the one that points to the right.Example: Think of an
ndarray
with shape(3,5,7)
.a
is a 3 dimensionalndarray
, i.e. it has 3 axes ("axes" is plural of "axis"). The configuration ofa
will look like 3 slices of bread where each slice is of dimension 5-by-7.a[0,:,:]
will refer to the 0-th slice,a[1,:,:]
will refer to the 1-st slice etc.a.sum(axis=0)
will applysum()
along the 0-th axis ofa
. You will add all the slices and end up with one slice of shape(5,7)
.a.sum(axis=0)
is equivalent tob
anda.sum(axis=0)
will both look like thisIn a
pd.DataFrame
, axes work the same way as innumpy.array
s:axis=0
will applysum()
or any other reduction function for each column.N.B. In @zhangxaochen's answer, I find the phrases "along the rows" and "along the columns" slightly confusing.
axis=0
should refer to "along each column", andaxis=1
"along each row".The easiest way for me to understand is to talk about whether you are calculating a statistic for each column (
axis = 0
) or each row (axis = 1
). If you calculate a statistic, say a mean, withaxis = 0
you will get that statistic for each column. So if each observation is a row and each variable is in a column, you would get the mean of each variable. If you setaxis = 1
then you will calculate your statistic for each row. In our example, you would get the mean for each observation across all of your variables (perhaps you want the average of related measures).axis = 0
: by column = column-wise = along the rowsaxis = 1
: by row = row-wise = along the columnsI understand this way :
Say if your operation requires traversing from left to right/right to left in a dataframe, you are apparently merging columns ie. you are operating on various columns. This is axis =1
Example
Similarly, if your operation requires traversing from top to bottom/bottom to top in a dataframe, you are merging rows. This is axis=0.
The designer of pandas, Wes McKinney, used to work intensively on finance data. Think of columns as stock names and index as daily prices. You can then guess what the default behavior is (i.e.,
axis=0
) with respect to this finance data.axis=1
can be simply thought as 'the other direction'.For example, the statistics functions, such as
mean()
,sum()
,describe()
,count()
all default to column-wise because it makes more sense to do them for each stock.sort_index(by=)
also defaults to column.fillna(method='ffill')
will fill along column because it is the same stock.dropna()
defaults to row because you probably just want to discard the price on that day instead of throw away all prices of that stock.Similarly, the square brackets indexing refers to the columns since it's more common to pick a stock instead of picking a day.