I am trying to use augment on a loess fit, but I receive the following error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 32, 11
In the error message, 11 happens to equal the number of observations in one segment and 32 is the total number of observations. The code is below.
require(broom)
require(dplyr)
# This example uses the lm method and it works
regressions <- mtcars %>% group_by(cyl) %>% do(fit = lm(wt ~ mpg, .))
regressions %>% augment(fit)
# This example uses the loess method and it generates the error
regressions2 <- mtcars %>% group_by(cyl) %>% do(fit = loess(wt ~ mpg, .))
regressions2 %>% augment(fit)
# The below code appropriately plots the loess fit using geom_smooth.
# My current # workaround is to do a global definition as an aes object in geom_smooth`
cylc = unique(mtcars$cyl) %>% sort()
for (i in 1:length(cyl)){
print(i)
print(cyl[i])
p<- ggplot(data=filter(mtcars,cyl==cylc[i]),aes(x=mpg,y=wt)) + geom_point() + geom_smooth(method="loess") + ggtitle(str_c("cyl = ",cyl[i]))
print(p)
}
This appears to be a problem related to the
do()
operator: when we check themodel.frame()
on one of the LOESS model objects, we get back all 32 rows rather than the subset corresponding to that model.A workaround is to hold on to the data and not just the model, and pass that as the second argument to
augment()
:This is usually recommended with
augment()
anyway, sincemodel.frame()
doesn't get all the original columns.Incidentally, I'm the maintainer of broom and I generally no longer recommend the
do()
approach (since dplyr has mostly been moving away from it).Instead, I suggest using tidyr's
nest()
and purrr'smap()
, as described in this chapter of R4DS. This makes it a little bit easier to hold on to the data and incorporate intoaugment()
.