How to extract fitted values of GAM {mgcv} for eac

2019-03-22 08:49发布

问题:

I'm searching for a method to add the predicted (real, not standardized) values of every single variable in my model

> model<-gam(LN_Brutto~s(agecont,by=Sex)+factor(Sex)+te(Month,Age)+s(Month,by=Sex), data=bears)

This is the summary of my model:

> summary(m13)

Family: gaussian 
Link function: identity 

Formula:
LN_Brutto ~ s(agecont, by = Sex) + factor(Sex) + te(Month, Age) + 
    s(Month, by = Sex)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.32057    0.01071  403.34   <2e-16 ***
factor(Sex)m  0.27708    0.01376   20.14   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                    edf  Ref.df      F  p-value    
s(agecont):Sexf  8.1611  8.7526 20.170  < 2e-16 ***
s(agecont):Sexm  6.6695  7.5523 32.689  < 2e-16 ***
te(Month,Age)   10.3651 12.7201  6.784 2.19e-12 ***
s(Month):Sexf    0.9701  0.9701  0.641    0.430    
s(Month):Sexm    1.3750  1.6855  0.193    0.787    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Rank: 60/62
R-sq.(adj) =  0.781   Deviance explained = 78.7%
GCV = 0.048221  Scale est. = 0.046918  n = 1093

predicted values are provided by this code:

> predict<-predict(m13, type = "terms")

And the result looks like this:

    factor(Sex) s(agecont):Sexf s(agecont):Sexm te(Month,Age)   s(Month):Sexf   s(Month):Sexm
1   0.2770806   0.000000000     0.111763696     -0.077845764    0.000000000     0.0007840912
2   0.2770806   0.000000000     0.240016156     -0.049143798    0.000000000     0.0007840912
3   0.2770806   0.000000000     0.034328752     0.046524454     0.000000000     -0.0058871897
4   0.0000000   -0.786533918    0.000000000     -0.067942427    0.021990192     0.0000000000
5   0.0000000   0.074434715     0.000000000     0.046524454     0.021990192     0.0000000000
6   0.0000000   0.161121563     0.000000000     0.089599601     0.021990192     0.0000000000
7   0.0000000   0.074434715     0.000000000     0.046524454     0.021990192     0.0000000000
8   0.2770806   0.000000000     -0.298597370    -0.007877328    0.000000000     -0.0058871897
...

But I guess these are just standardized predicted values and not the real values (the real ones should have no negative values!?).

So does anyone know what I have to modify in the code, to get the real values? Any idea? Thank you!

回答1:

Not quite sure if I follow you correctly, but predict(model, type = "terms") might be the solution you're looking for.

Update

I don't think these are standardised. Possibly some of the coefficients are just negative.

Consider the example from the help file ?mgcv:::predict.gam:

library(mgcv)
n<-200
sig <- 2
dat <- gamSim(1,n=n,scale=sig)

b<-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

The results below illustrate that these are in fact the contributions that are being used for each predictor to calculate the fitted values (by calculating the sum of each of these contributions and then adding the intercept and the offset).

> head(predict(b))
        1         2         3         4         5         6 
 9.263322  2.822200  7.137201  4.902631 14.558401 11.889092 
> head(rowSums(predict(b, type = "terms")) + attr(predict(b, type = "terms"), "constant") + dat$x3)
        1         2         3         4         5         6 
 9.263322  2.822200  7.137201  4.902631 14.558401 11.889092 


回答2:

To return predicted values on the same scale of the response you need to set predict(model, type = "response")

The default behaviour of the gam is type = "link" which returns the linear predictor and often with standard errors (thus the positive and negative values you found).

Read more on the ?mgcv::predict.gam help page.



标签: r gam mgcv