correlation matrix in python

2020-02-21 07:20发布

How do I calculate correlation matrix in python? I have an n-dimensional vector in which each element has 5 dimension. For example my vector looks like

[
 [0.1, .32, .2,  0.4, 0.8], 
 [.23, .18, .56, .61, .12], 
 [.9,   .3,  .6,  .5,  .3], 
 [.34, .75, .91, .19, .21]
] 

In this case dimension of the vector is 4 and each element of this vector have 5 dimension. How to construct the matrix in the easiest way?

Thanks

标签: python
4条回答
beautiful°
2楼-- · 2020-02-21 07:47

Using numpy, you could use np.corrcoef:

In [88]: import numpy as np

In [89]: np.corrcoef([[0.1, .32, .2, 0.4, 0.8], [.23, .18, .56, .61, .12], [.9, .3, .6, .5, .3], [.34, .75, .91, .19, .21]])
Out[89]: 
array([[ 1.        , -0.35153114, -0.74736506, -0.48917666],
       [-0.35153114,  1.        ,  0.23810227,  0.15958285],
       [-0.74736506,  0.23810227,  1.        , -0.03960706],
       [-0.48917666,  0.15958285, -0.03960706,  1.        ]])
查看更多
够拽才男人
3楼-- · 2020-02-21 07:55

As I almost missed that comment by @Anton Tarasenko, I'll provide a new answer. So given your array:

a = np.array([[0.1, .32, .2,  0.4, 0.8], 
             [.23, .18, .56, .61, .12], 
             [.9,   .3,  .6,  .5,  .3],  
             [.34, .75, .91, .19, .21]]) 

If you want the correlation matrix of your dimensions (columns), which I assume, you can use numpy (note the transpose!):

import numpy as np
print(np.corrcoef(a.T))

Or if you have it in Pandas anyhow:

import pandas as pd
print(pd.DataFrame(a).corr())

Both print

array([[ 1.        , -0.03783885,  0.34905716,  0.14648975, -0.34945863],
      [-0.03783885,  1.        ,  0.67888519, -0.96102583, -0.12757741],
      [ 0.34905716,  0.67888519,  1.        , -0.45104803, -0.80429469],
      [ 0.14648975, -0.96102583, -0.45104803,  1.        , -0.15132323],
      [-0.34945863, -0.12757741, -0.80429469, -0.15132323,  1.        ]])
查看更多
时光不老,我们不散
4楼-- · 2020-02-21 08:04

You can also use np.array if you don't want to write your matrix all over again.

import numpy as np
a = np.array([ [0.1, .32, .2,  0.4, 0.8], [.23, .18, .56, .61, .12], [.9,   .3,  .6,  .5,  .3],  [.34, .75, .91, .19, .21]]) 
b = np.corrcoef(a)
print b
查看更多
贼婆χ
5楼-- · 2020-02-21 08:06

Here is a pretty good example of calculating a correlations matrix form multiple time series using Python. Included source code calculates correlation matrix for a set of Forex currency pairs using Pandas, NumPy, and matplotlib to produce a graph of correlations.

Sample data is a set of historical data files, and the output is a single correlation matrix and a plot. The code is very well documented.

查看更多
登录 后发表回答