I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Here is an implementation for pearson correlation based on sparse vector. The vectors here are expressed as a list of tuples expressed as (index, value). The two sparse vectors can be of different length but over all vector size will have to be same. This is useful for text mining applications where the vector size is extremely large due to most features being bag of words and hence calculations are usually performed using sparse vectors.
Unit tests:
I have a very simple and easy to understand solution for this. For two arrays of equal length, Pearson coefficient can be easily computed as follows:
Hmm, many of these responses have long and hard to read code...
I'd suggest using numpy with its nifty features when working with arrays:
You can take a look at this article. This is a well-documented example for calculating correlation based on historical forex currency pairs data from multiple files using pandas library (for Python), and then generating a heatmap plot using seaborn library.
http://www.tradinggeeks.net/2015/08/calculating-correlation-in-python/
This is a implementation of Pearson Correlation function using numpy:
The Pearson correlation can be calculated with numpy's
corrcoef
.