How numpy.cov() function is implemented?

2020-07-08 07:58发布


I have my own implementation of the covariance function based on the equation:

Calculate the covariance coefficient between two variables.

import numpy as np

X = np.array([171, 184, 210, 198, 166, 167])
Y = np.array([78, 77, 98, 110, 80, 69])

# Expected value function.
def E(X, P):
    expectedValue = 0
    for i in np.arange(0, np.size(X)):
        expectedValue += X[i] * (P[i] / np.size(X))
    return expectedValue 

# Covariance coefficient function.
def covariance(X, Y):
    Calculate the product of the multiplication for each pair of variables
    XY = X * Y

    # Calculate the expected values for each variable and for the XY.
    EX = E(X, np.ones(np.size(X)))
    EY = E(Y, np.ones(np.size(Y)))
    EXY = E(XY, np.ones(np.size(XY)))

    # Calculate the covariance coefficient.
    return EXY - (EX * EY)

# Display matrix of the covariance coefficient values.
covMatrix = np.array([[covariance(X, X), covariance(X, Y)], 
[covariance(Y, X), covariance(Y, Y)]])  
print("My function:", covMatrix)

# Display standard numpy.cov() covariance coefficient matrix.
print("Numpy.cov() function:", np.cov([X, Y]))

But the problem is, that I'm getting different values from my function and from numpy.cov(), ie:

My function: [[ 273.88888889  190.61111111]
 [ 190.61111111  197.88888889]]
Numpy.cov() function: [[ 328.66666667  228.73333333]
 [ 228.73333333  237.46666667]]

Why is that? How is numpy.cov() function implemented? If the function numpy.cov() is well-implemented, what am I doing wrong? I'll just say, that results from my function covariance() are consistent with the results from paper examples in the internet for calculating the covariance coefficient, eg


The numpy function has a different normalization to yours as a default setting. Try instead

>>> np.cov([X, Y], ddof=0)
array([[ 273.88888889,  190.61111111],
       [ 190.61111111,  197.88888889]])

