This question already has an answer here:
I loaded the inbuilt R data 'women' which has a tabular data of average American women's height and corresponding weight. this table has 15 rows. Using this data I am trying to predict the weight for specific values of height. I made a linear model first and gave new values to predict. But R still comes up with the 15 figures from the original data.
I am a beginner in regression so please tell me if I am doing anything wrong here.
data()
> women<-data.frame(women)
> names(women)
[1] "height" "weight"
> plot(women$weight~women$height)
> model<-lm(women$weight~women$height,data=women)
> new<-data.frame(height=c(82,83,84,85))
> wgt.prediction<-predict(model,new)
Warning message:
'newdata' had 4 rows but variables found have 15 rows
> wgt.prediction
1 2 3 4 5 6 7 8 9 10 11 12 13
112.5833 116.0333 119.4833 122.9333 126.3833 129.8333 133.2833 136.7333 140.1833 143.6333 147.0833 150.5333 153.9833
14 15
157.4333 160.8833
The first
predict
will work as the model's dependent variable iswt
and the new data have the variablewt
as well.The second
predict
will not work because the model's dependent variable isdt$wt
so every time the model will go back todt
to get the variablewt
. In fact, no matter what your new dataset looks like, the model will try to predict usingdt$wt
.Note that extrapolating predictions outside the range of the original data can give poor answers; however, ignoring that try the following.
First, it is not necessary to use
data()
ordata.frame
.women
will be available to you anyways and it is already a data frame.Also, the model's independent variable was specified in the question as
women$height
but the prediction specified it asheight
. It does not know thatwomen$height
andheight
are the same.Replace all your code with this:
giving:
To plot the data with the predictions (i.e. with
weights
) and the regression line defined bymodel
(continued after graph):Although normally one uses
predict
, given the problem introduced by using $ in the formula, an alternative using your original formulation would be to calculate the predictions like this:giving: