How to fit linear regression model to get maximum

2019-08-21 04:43发布

问题:

I am trying to develop a Rainfall-Runoff model for a Canadian watershed. In my study watershed, antecedent precipitation both summer and winter play critical role in predicting Spring runoff (April). In the following code, I grab, summer (August to October) and winter precipitation (November to March) as well as Spring runoff (April).

install.packages("weathercan")
install.packages("tidyhydat")
library(weathercan)
library(tidyhydat)
library(tidyverse)

Met_Data = weather_dl(station_ids = 2925, start = "1986-01-01", end = "2005-12-31", interval = "month", trim = FALSE, format = TRUE)[c(12,13,30)]
Met_Data$total_precip[is.na(Met_Data$total_precip)] = 20

# summer precipitation
Summer_PCP = Met_Data %>% filter(month == "08"| month == "09"| month == "10")
Summer_PCP = spread(Summer_PCP, key = month, value = total_precip)
colnames(Summer_PCP) = c("Year", "Aug", "Sep", "Oct")

# Winter precipitation
Winter_PCP1 = Met_Data %>% filter(month == "11"| month == "12") %>% group_by(year) %>% summarise(TotalPCP=sum(total_precip))
Winter_PCP2 = Met_Data %>% filter(month == "01"| month == "02"| month == "03") %>% group_by(year) %>% summarise(TotalPCP=sum(total_precip))
Winter_PCP1$year = as.numeric(Winter_PCP1$year)+1
Winter_PCP = merge(Winter_PCP1,Winter_PCP2, by="year")
Winter_PCP$TotalPCP = Winter_PCP$TotalPCP.x+Winter_PCP$TotalPCP.y
Winter_PCP = Winter_PCP[c(1,4)]

# Runoff Data
Runoff <- hy_daily_flows (station_number = "05JE001", start_date = "1986-01-01", end_date = "2005-12-31")[c(2,4)]
Runoff$Year=as.numeric (format(Runoff$Date,"%Y"))
Runoff$Month=as.numeric (format(Runoff$Date,"%m"))
Runoff$Day=as.numeric (format(Runoff$Date,"%d"))
Runoff_Monthly=Runoff %>% select(Year, Month, Day, Value) %>% group_by(Year, Month) %>% summarise(TotalRunoff=sum(Value*3600*24)) %>% filter(Month==4)

I am assigning equal weight to both Winter and summer precipitation (50% or 0.5)- so in my code below I am multiplying both winter and summer precipitation with 0.5. Now here comes the tricky part, and to be honest the most difficult part for me. I would like to fit coefficient ranging from 0 to 1 to each of my summer precipitation such that

  1. a + b + c = 1
  2. a < b < c

Below is the equation that I would like to fit in such a way (i.e., by repeating values for a, b, and c coefficient) that I get maximum correlation (higher R-squared) value for Runoff verses Rainfall.

Fit a linear model and find coefficients that would result maximum R-squared value

Establish_Relation = lm(Runoff_Monthly ~ [0.5 * Winter_PCP]  +  [0.5 * (a*Summer_PCP$Aug + b*Summer_PCP$Sep + c*Summer_PCP$Oct)])

Update

So with the help of @r2evans, I was able to derive a set of coefficient meeting the conditions as stated above. what I want now is to pass these coefficient one by one through the Establish_Relation equation (possibly through for loop?) and retained the coefficient that would give me maximum R-squared value. Here is the code for generating coefficients

myrandom <- function(n) {
  m <- matrix(runif(3*n), ncol=3)
  m <- cbind(m, rowSums(m)) # rowSums is efficient
  t(apply(m, 1, function(a) sort(a[1:3] / a[4])))
}

m = myrandom(500)
colnames(m)=c("a","b","c")