Numpy:zero mean data and standardization

I saw in tutorial (there were no further explanation) that we can process data to zero mean with x -= np.mean(x, axis=0) and normalize data with x /= np.std(x, axis=0). Can anyone elaborate on these two pieces on code, only thing I got from documentations is that np.mean calculates arithmetic mean calculates mean along specific axis and np.std does so for standard deviation.

标签： python numpy image-preprocessing

4条回答

虎瘦雄心在

2楼-- · 2020-08-13 06:12

Follow the comments in the code below

import numpy as np

# create x
x = np.asarray([1,2,3,4], dtype=np.float64)

np.mean(x) # calculates the mean of the array x
x-np.mean(x) # this is euivalent to subtracting the mean of x from each value in x
x-=np.mean(x) # the -= means can be read as x = x- np.mean(x)

np.std(x) # this calcualtes the standard deviation of the array
x/=np.std(x) # the /= means can be read as x = x/np.std(x)

0人赞添加讨论(0) 举报

等我变得足够好

3楼-- · 2020-08-13 06:33

This is also called zscore.

SciPy has a utility for it:

    >>> from scipy import stats
    >>> stats.zscore([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
    ...                0.1954,  0.6307,  0.6599,  0.1065,  0.0508])
    array([ 1.1273, -1.247 , -0.0552,  1.0923,  1.1664, -0.8559,  0.5786,
            0.6748, -1.1488, -1.3324])

0人赞添加讨论(0) 举报

傲

4楼-- · 2020-08-13 06:35

Key here are the assignment operators. They actually performs some operations on the original variable. a += c is actually equal to a=a+c.

So indeed a (in your case x) has to be defined beforehand.

Each method takes an array/iterable (x) as input and outputs a value (or array if a multidimensional array was input), which is thus applied in your assignment operations.
The axis parameter means that you apply the mean or std operation over the rows. Hence, you take values for each row in a given column and perform the mean or std. Axis=1 would take values of each column for a given row.

What you do with both operations is that first you remove the mean so that your column mean is now centered around 0. Then, when you divide by std, you happen to reduce the spread of the data around this zero, and now it should roughly be in a [-1, +1] interval around 0.

So now, each of your column values is centered around zero and standardized.

There are other scaling techniques, such as removing the minimal or maximal value and dividing by the range of values.

0人赞添加讨论(0) 举报

贪生不怕死

5楼-- · 2020-08-13 06:39

From the given syntax you have I conclude, that your array is multidimensional. Hence I will first discuss the case where your x is just a linear array:

np.mean(x) will compute the mean, by broadcasting x-np.mean(x) the mean of x will be subtracted form all the entries. x -=np.mean(x,axis = 0) is equivalent to x = x-np.mean(x,axis = 0). Similar forx/np.std(x)`.

In the case of multidimensional arrays the same thing happens, but instead of computing the mean over the entire array, you just compute the mean over the first "axis". Axis is the numpy word for dimension. So if your x is two dimensional, then np.mean(x,axis =0) = [np.mean(x[:,0], np.mean(x[:,1])...]. Broadcasting again will ensure, that this is done to all elements.

Note, that this only works with the first dimension, otherwise the shapes will not match for broadcasting. If you want to normalize wrt another axis you need to do something like:

x -= np.expand_dims(np.mean(x,axis = n),n)

0人赞添加讨论(0) 举报

Numpy:zero mean data and standardization

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间