-->

R: Summarize inside an nlsLM() statement

2020-03-27 08:26发布

问题:

I'm using nlsLM() to make a model of a power function, but I need to summarize my data within the function call to find the appropriate coefficient and exponent. More specifically, here is what my model code looks like:

Jmod = nlsLM(value~(a)*summarise(funs(mean), (MW)^b),
                 start = list(a=100000, b = 1/3), data = mod_data,
                 upper = c(Inf,1), lower = c(0,1/5))

where MW is the data I am trying to use to predict value. The data for MW is already grouped by month based off a variable called datetime, so I would like to take the monthly average of MW^b where b is found by the nlsLM() statement.

I would take the average beforehand, but as you may realize, this is not mathematically equivalent [ie. ((a+c)/2)^b is not equal to (a^b + c^b)/2].

If anyone has any info on how to do this, I would be greatly appreciative!


EDIT:

Below is the code to make a sample data set of what I'm trying to work with:

structure(list(datetime = structure(c(1514782800, 1480568400,1504242000, 1509512400, 1509512400, 1485925200, 1517461200, 1485925200, 1501563600, 1467349200, 1472706000, 1454302800, 1483246800, 1498885200, 1506834000, 1477976400, 1483246800, 1477976400, 1509512400, 1496293200, 1451624400, 1454302800, 1454302800, 1464757200, 1498885200, 1517461200, 1462078800, 1506834000, 1522558800, 1483246800, 1501563600, 1451624400, 1485925200, 1501563600, 1451624400, 1517461200, 1475298000, 1480568400, 1512104400, 1456808400, 1477976400, 1475298000, 1517461200, 1459486800, 1501563600, 1477976400, 1506834000, 1506834000, 1451624400, 1483246800), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(2863.27837518519, 2878.40382333333, 1236.74444444444, 3522.48888888889, 3522.48888888889, 2033.55555555556, 3305.5, 2033.55555555556, 2094.7037037037, 3052.91875740741, 2960.52222222222, 1733.7918262963, 2850.28673851852, 2841.40740740741, 3310.77538814815, 2266.26172851852, 2850.28673851852, 2266.26172851852, 3522.48888888889, 2802.55555555556, 2196.82556740741, 1733.7918262963, 1733.7918262963, 3001.43703703704, 2841.40740740741, 3305.5, 2061.4826762963, 3310.77538814815, 3107.01851851852, 2850.28673851852, 2094.7037037037, 2196.82556740741, 2033.55555555556, 2094.7037037037, 2196.82556740741, 3305.5, 2848.90322592593, 2878.40382333333, 2873.73476703704, 2208.64755074074, 2266.2172851852, 2848.90322592593, 3305.5, 2021.68765444444, 2094.7037037037, 2266.26172851852, 3310.77538814815, 3310.77538814815, 2196.82556740741, 2850.28673851852), mon = structure(c(2018, 2016.91666666667, 2017.66666666667, 2017.83333333333, 2017.83333333333, 2017.08333333333, 2018.08333333333, 2017.08333333333, 2017.58333333333, 2016.5, 2016.66666666667, 2016.08333333333, 2017, 2017.5, 2017.75, 2016.83333333333, 2017, 2016.83333333333, 2017.83333333333, 2017.41666666667, 2016, 2016.08333333333, 2016.08333333333, 2016.41666666667, 2017.5, 2018.08333333333, 2016.33333333333, 2017.75, 2018.25, 2017, 2017.58333333333, 2016, 2017.08333333333, 2017.58333333333, 2016, 2018.08333333333, 2016.75, 2016.91666666667, 2017.91666666667, 2016.16666666667, 2016.83333333333, 2016.75, 2018.08333333333, 2016.25,2017.58333333333,2016.83333333333, 2017.75, 2017.75, 2016, 2017), class = "yearmon"), 
MW = c(2.6142700774997, 4.02670249993547, 0.666666666666667, 
0.724114015571947, 4.07197668868287, 3.74122386862433, 3.30097429092907, 
3.84858110028323, 0.666666666666667, 0.666666666666667, 4.35000446878457, 
0.666666666666667, 0.666666666666667, 3.8371824280444, 0.825077317374, 
0.666666666666667, 4.028058457579, 0.666666666666667, 4.3378032532779, 
3.84270845997837, 1.40955788986009, 0.666666666666667, 0.666666666666667, 
4.05845600900597, 4.00664052392117, 4.0295346724872, 0.666666666666667, 
4.14159923664523, 4.231951299842, 3.9562222817766, 0.666666666666667, 
3.61602795165213, 0.666666666666667, 3.58079262746603, 4.12197770915903, 
4.2610646492437, 4.02152528469467, 1.0117763092792, 2.03648922832252, 
0.666666666666667, 0.666666666666667, 3.8042476910097, 3.91787334748133, 
0.666666666666667, 0.666666666666667, 0.89571472289964, 4.1530002677697, 
3.93733212731873, 0.710671314318797, 0.666666666666667)), .Names = c("datetime", "value", "mon", "MW"), row.names = c(39113L, 12946L, 4365L, 37505L, 36601L, 31055L, 39814L, 31433L, 32105L, 20668L, 18191L, 8328L, 10232L, 25689L, 35528L, 4577L, 10302L, 5146L, 37975L, 29670L, 28429L, 7932L, 8468L, 23120L, 25111L, 39699L, 24312L, 36246L, 1556L, 11068L, 33269L, 29163L, 31685L, 32419L, 29059L, 40618L, 16751L, 11737L, 34371L, 6001L, 4864L, 16413L, 40304L, 8716L, 33190L, 5399L, 35610L, 36462L, 28338L, 10371L), class = "data.frame")

This will create the mod_data that I am using to make the model. And just to reiterate, what I have done here is group data by month, found in the mon column of my data, and now I want to summarize the data by month, but with the exponent included in the mean, as seen in my above code. Thanks again!

回答1:

What you want isn't clear from the question. Do you want to fit a separate model for every unique month in your data? Or do you want to fit one model for all of the data and then take monthly averages of the value of MW^b ?

Here's one way on how to do the latter case.

require(minpack.lm)
require(tidyverse)
require(broom)

dat <- structure(...) # provided in the question

predictions <- 
    dat %>% 
    ungroup %>%
    mutate(row = row_number()) %>%
    do(augment(nlsLM(
                formula = value ~ a * MW^b + 0*row, 
                data = .,
                start = list(a = 100000, b=1/3),
                upper = c(Inf, 1), 
                lower = c(0, 1/5)
               )
           )
       )

joined <- 
    dat %>%
    mutate(row = row_number()) %>%
    left_join(predictions, by=c('MW', 'value', 'row')) %>%
    select(-row)

joined %>%
    group_by(mon) %>%
    mutate(monthly_avg_prediction = mean(.fitted))

Notes:

  1. This kind of stuff is much easier with the broom package. This is because broom converts the results of model-finding functions like lm, nls, or nlsLM etc. into dataframes. So you don't have to memorize or re-lookup the function-specific structure of the model object (e.g. model$params[['estimate']][[1]]) or similar stuff; the model results are already in R-standard dataframe format.
  2. My solution uses ideas from this StackOverflow answer on how to join broom-generated dataframes of predictions with original data. That's why the stuff with row_number() and left_join() are in there. Otherwise, in the general case, augment will throw away data from the original dataframe that is not used in the model prediction, and it will not work well if there are repeated values in the data that is used.
  3. The .fitted column is generated by broom's augment function. It is the model prediction at the indicated datapoint.
  4. The result (that I think you might want) is in the monthly_avg_prediction column of the joined dataframe. But that represents a single, global model, fit on all the data, and predictions from that model are averaged by month.


标签: r dplyr nls