How to get the numerical fitting results when plot

2020-01-29 04:11发布

If I use the seaborn library in Python to plot the result of a linear regression, is there a way to find out the numerical results of the regression? For example, I might want to know the fitting coefficients or the R2 of the fit.

I could re-run the same fit using the underlying statsmodels interface, but that would seem to be unnecessary duplicate effort, and anyway I'd want to be able to compare the resulting coefficients to be sure the numerical results are the same as what I'm seeing in the plot.

3条回答
ら.Afraid
2楼-- · 2020-01-29 04:56

There's no way to do this.

In my opinion, asking a visualization library to give you statistical modeling results is backwards. statsmodels, a modeling library, lets you fit a model and then draw a plot that corresponds exactly to the model you fit. If you want that exact correspondence, this order of operations makes more sense to me.

You might say "but the plots in statsmodels don't have as many aesthetic options as seaborn". But I think that makes sense — statsmodels is a modeling library that sometimes uses visualization in the service of modeling. seaborn is a visualization library that sometimes uses modeling in the service of visualization. It is good to specialize, and bad to try to do everything.

Fortunately, both seaborn and statsmodels use tidy data. That means that you really need very little effort duplication to get both plots and models through the appropriate tools.

查看更多
Luminary・发光体
3楼-- · 2020-01-29 05:01

Looking thru the currently available doc, the closest I've been able to determine if this functionality can now be met is if one uses the scipy.stats.pearsonr module.

r2 = stats.pearsonr("pct", "rdiff", df)

In attempting to make it work directly within a Pandas dataframe, there's an error kicked out from violating the basic scipy input requirements:

TypeError: pearsonr() takes exactly 2 arguments (3 given)

I managed to locate another Pandas Seaborn user who evidently solved it: https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/stats.py#L2392

sns.regplot("rdiff", "pct", df, corr_func=stats.pearsonr);

But, unfortunately I haven't managed to get that to work as it appears the author created his own custom 'corr_func' or either there's an undocumented Seaborn arguement passing method that's available using a more manual method:

# x and y should have same length.
    x = np.asarray(x)
    y = np.asarray(y)
    n = len(x)
    mx = x.mean()
    my = y.mean()
    xm, ym = x-mx, y-my
    r_num = np.add.reduce(xm * ym)
    r_den = np.sqrt(ss(xm) * ss(ym))
    r = r_num / r_den

# Presumably, if abs(r) > 1, then it is only some small artifact of floating
# point arithmetic.
r = max(min(r, 1.0), -1.0)
df = n-2
if abs(r) == 1.0:
    prob = 0.0
else:
    t_squared = r*r * (df / ((1.0 - r) * (1.0 + r)))
    prob = betai(0.5*df, 0.5, df / (df + t_squared))
return r, prob

Hope this helps to advance this original request along toward an interim solution as there's much needed utility to add the regression fitness stats to the Seaborn package as a replacement to what one can easily get from MS-Excel or a stock Matplotlib lineplot.

查看更多
\"骚年 ilove
4楼-- · 2020-01-29 05:07

Seaborn's creator has unfortunately stated that he won't add such a feature, so here's a workaround.

def regplot(*args, **kwargs):
    # this is the class that `sns.regplot` uses
    plotter = sns.regression._RegressionPlotter(*args, **kwargs)

    # this is essentially the code from `sns.regplot`
    ax = kwargs.get("ax", None)
    if ax is None:
        ax = plt.gca()

    scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws)
    scatter_kws["marker"] = marker
    line_kws = {} if line_kws is None else copy.copy(line_kws)

    plotter.plot(ax, scatter_kws, line_kws)

    # unfortunately the regression results aren't stored, so we rerun
    grid, yhat, err_bands = plotter.fit_regression(plt.gca())

    # also unfortunately, this doesn't return the parameters, so we infer them
    slope = (yhat[-1] - yhat[0]) / (grid[-1] - grid[0])
    intercept = yhat[0] - slope * grid[0]
    return slope, intercept

Note that this only works for linear regression because it simply infers the slope and intercept from the regression results. The nice thing is that it uses seaborn's own regression class and so the results are guaranteed to be consistent with what's shown. The downside is of course that we're using a private implementation detail in seaborn that can break at any point.

查看更多
登录 后发表回答