Correlation coefficient on gnuplot

2019-04-10 20:35发布

问题:

I want to plot data using fit function : function f(x) = a+b*x**2. After ploting i have this result:

correlation matrix of the fit parameters:

               m      n      
m               1.000 
n              -0.935  1.000 

My question is : how can i found a correlation coefficient on gnuplot ?

回答1:

If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.

There are lots of other tools for calculating correlation coefficients, e.g. numpy.



回答2:

You can use stats command in gnuplot, which syntax is similar to plot command:

stats "file.dat" using 2:(f($2)) name "A"

Correlation coefficient will be stored in A_correlation variable. You can use it subsequently to plot your data or just print on the screen using set label command:

set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85

You can find more about stats command in gnuplot documentation.



回答3:

Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:

    file = "my_data.tsv"
    f(x)=a+b*(x)
    fit f(x) file using 2:3 via a,b
    r = system(sprintf("python correlation.py %s",file)) 
    ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
    plot \
      file using 2:3 notitle,\
      f(x) title ti

This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:

    from numpy import genfromtxt
    from numpy import corrcoef
    import sys
    data = genfromtxt(sys.argv[1], delimiter='\t')
    r = corrcoef(data[1:,1],data[1:,2])[0,1]
    print("%.3f" % r).lstrip('0')

Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.

The resulting title of the fit line is (for a personal example):

y = 2.15 + 1.58x (r = .592)


回答4:

Since you are probably using fit function you can first refer to this link to arrive at R2 values. The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value. The code for R2 is stated as:

SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST

The next step would be to show the R^2 values on the graph. Which can be achieved using the code :

set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7