I have two arrays of data as hight and weight:
import numpy as np, matplotlib.pyplot as plt
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
plt.plot(heights,weights,'bo')
plt.show()
I want to produce the plot similiar to this:
http://www.sas.com/en_us/software/analytics/stat.html#m=screenshot6
Any ideas is appreciated.
An update to pylang's great answer in response to PJW: if you're trying to fit a greater than first order polynomial, the calculation of y2 needs to be updated from:
to
The original code only works for a first order polynomial (that's simply a line).
In response to tryptofan's comment, yes, in order to get a 95% two-tailed t-statistic the code should be updated from
to
You can use seaborn plotting library to create plots as you want.
Here's what I put together. I tried to closely emulate your screenshot.
Given
Some detailed helper functions for plotting confidence intervals.
Code
Output
Using
plot_ci_manual()
:Using
plot_ci_bootstrap()
:Hope this helps. Cheers.
Details
I believe that since the legend is outside the figure, it does not show up in matplotblib's popup window. It works fine in Jupyter using
%maplotlib inline
.The primary confidence interval code (
plot_ci_manual()
) is adapted from another source producing a plot similar to the OP. You can select a more advanced technique called residual bootstrapping by uncommenting the second optionplot_ci_bootstrap()
.Updates
stats.t.ppf()
accepts the lower tail probability. According to the following resources,t = sp.stats.t.ppf(0.95, n - m)
was corrected tot = sp.stats.t.ppf(0.975, n - m)
to reflect a two-sided 95% t-statistic (or one-sided 97.5% t-statistic).dof=17
y2
was updated to respond more flexibly with a given model (@regeneration).equation
function was added to wrap the model function. Non-linear regressions are possible although not demonstrated. Amend appropriate variables as needed (thanks @PJW).See Also
statsmodels
library.uncertainties
library (install with caution in a separate environment).Thanks to pylang for the answer. I had problems with the calculation of y2, as when the regression line is decreasing, the confidence iterval did not. With the present calculation of y2, the prediction y_model will always span from min to max. Therefore I changed the calculation of y2 to: