Linear Regression calculation several times in one

2019-08-09 15:51发布

I am using R to evaluate climate data and I have a data set that looks like the following miniaturized version... please forgive my crude posting etiquette, I hope this post is understandable.

[0][STA.NAME] [YEAR] [SUM.CDD]  
1 NAME1 1967    760  
2 NAME1 1985    800  
3 NAME1 1996    740  
4 NAME1 2003    810  
5 NAME1 2011    790  
6 NAME2 1967    700  
7 NAME2 1985    690  
8 NAME2 1996    850  
9 NAME2 2003    790  
10 NAME3    1967    760  
11 NAME3    1985    800  
12 NAME3    1990    740  
13 NAME3    1996    810  
14 NAME3    2003    790  
15 NAME3    2011    800  

I am trying to return a new DF with this

[STA.NAME] [Eq'n of trend]  
NAME1  (y = mx + b)  
NAME2  (y = mx + b)  

etc...

Eventually I will need to calculate variance of the trends, as well as total variance of data and would like to eventually append those to this resulting data set for something like...

[STA.NAME] [TREND] [VAR.TREND] [VAR.DATA]   
with values in rows, 1 for each STA.NAME...

Any help is greatly appreciated, If there is a better way than lm(), with which I am currently stumped, I would be interested as well.

Thank you very much,

Jesse

1条回答
smile是对你的礼貌
2楼-- · 2019-08-09 16:20

Here is a simple solution using ddply() from plyr to return the coefficients for each group:

First replicate the data:

x <- read.table(text="
STA.NAME YEAR SUM.CDD  
1 NAME1 1967    760  
2 NAME1 1985    800  
3 NAME1 1996    740  
4 NAME1 2003    810  
5 NAME1 2011    790  
6 NAME2 1967    700  
7 NAME2 1985    690  
8 NAME2 1996    850  
9 NAME2 2003    790  
10 NAME3    1967    760  
11 NAME3    1985    800  
12 NAME3    1990    740  
13 NAME3    1996    810  
14 NAME3    2003    790  
15 NAME3    2011    800  ", header=TRUE)

Now do the modelling:

library(plyr)
ddply(x, .(STA.NAME), function(z)coef(lm(SUM.CDD ~ YEAR, data=z)))

  STA.NAME (Intercept)      YEAR
1    NAME1   -444.8361 0.6147541
2    NAME2  -6339.2047 3.5702200
3    NAME3   -995.2381 0.8928571

Now, depending on what you want to do, it may be simpler (and perhaps more meaningful) to create a single model of your data:

fit <- lm(SUM.CDD ~ YEAR + STA.NAME, data=x)

Get a summary:

summary(fit)

Call:
lm(formula = SUM.CDD ~ YEAR + STA.NAME, data = x)

Residuals:
   Min     1Q Median     3Q    Max 
-63.57 -22.21  10.72  18.62  80.72 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -2065.6401  1463.5353  -1.411   0.1858  
YEAR              1.4282     0.7345   1.945   0.0778 .
STA.NAMENAME2   -15.8586    27.5835  -0.575   0.5769  
STA.NAMENAME3     3.9046    24.7089   0.158   0.8773  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 40.8 on 11 degrees of freedom
Multiple R-squared: 0.3056, Adjusted R-squared: 0.1162 
F-statistic: 1.614 on 3 and 11 DF,  p-value: 0.2424 

Extract only the coefficients:

coef(fit)
  (Intercept)          YEAR STA.NAMENAME2 STA.NAMENAME3 
 -2065.640078      1.428247    -15.858650      3.904632 

Finally, you perhaps wanted to fit a model with interaction terms. This model gives you effectively the same results as the original plyr solution. Depending on your data and your objectives, this might be the way to do it:

fit <- lm(SUM.CDD ~ YEAR * STA.NAME, data=x)
summary(fit)

Call:
lm(formula = SUM.CDD ~ YEAR * STA.NAME, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-57.682 -13.166  -1.012  23.006  63.046 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)
(Intercept)         -444.8361  2280.7464  -0.195    0.850
YEAR                   0.6148     1.1447   0.537    0.604
STA.NAMENAME2      -5894.3687  3661.9795  -1.610    0.142
STA.NAMENAME3       -550.4020  3221.8390  -0.171    0.868
YEAR:STA.NAMENAME2     2.9555     1.8406   1.606    0.143
YEAR:STA.NAMENAME3     0.2781     1.6172   0.172    0.867

Residual standard error: 39.17 on 9 degrees of freedom
Multiple R-squared: 0.4763, Adjusted R-squared: 0.1854 
F-statistic: 1.637 on 5 and 9 DF,  p-value: 0.2451 
查看更多
登录 后发表回答