r predict function returning too many values [clos

2019-01-15 17:34发布

问题:

I've read other postings regarding named variables and tried implementing the answers but still get too many values for my new data that I want to run my existing model on. Here is working example code:

set.seed(123)
mydata <- data.frame("y"=rnorm(100,mean=0, sd = 1),"x"=c(1:100))

mylm <- lm(y ~ x, data=mydata)

# ok so mylm is a model on 100 points - lets look at it and the data
par(mfrow=c(2,2))
plot(mylm)
par(mfrow=c(1,1))
predvals <- predict(mylm, data=mydata)
plot(mydata$x,mydata$y)
lines(predvals)

No surprises here - a straight line through generated points - both 100 observations in length. Now I generate 20 points of new data with the exact same names and when I run the new data through predict() I expect to get 20 points and instead I get 100. What am I missing! Driving me crazy....

newdata <- data.frame("y"=rnorm(20,mean=0, sd = 1), "x"=c(1:20))
predvals <- predict(mylm, data=newdata)
length(newdata$y)
length(predvals)    

# quick -not elegant - way to look at it:
plot(predvals)
lines(newdata$x,newdata$y)

Do I need to tell predict() to only use 20 points or something like that?

Your issue is in predvals <- predict(mylm, data=newdata).

The correct call is predict(mylm, newdata=newdata). The predict() function in R takes a named argument newdata, not data.