unfortunately I have problems using predict() in the following simple example:
library(e1071)
x <- c(1:10)
y <- c(0,0,0,0,1,0,1,1,1,1)
test <- c(11:15)
mod <- svm(y ~ x, kernel = "linear", gamma = 1, cost = 2, type="C-classification")
predict(mod, newdata = test)
The result is as follows:
> predict(mod, newdata = test)
1 2 3 4 <NA> <NA> <NA> <NA> <NA> <NA>
0 0 0 0 0 1 1 1 1 1
Can anybody explain why predict() only gives the fitted values of the training sample (x,y) and does not care about the test-data?
Thank you very much for your help!
Richard
You need newdata to be of the same form, ie using a data.frame helps:
By the way, this is also shown the help page for
svm()
:So in sum, use the formula interface and supply a data.frame --- that is how essentially all modeling functions in R work.
It looks like this is because you misuse the formula interface to
svm()
. Normally, one supplies a data frame or similar object within which the variables in the formula are searched for. It usually doesn't matter if you don't do this, even if it is not best practice, but when you want to predict, not putting variables in a data frame gets you in a right mess. The reason it returns the training data is because you don't providenewdata
an object with a component namedx
in it. Hence it can't find the new datax
so returns the fitted values. This is common for most Rpredict
methods I know.The solution then is to i) put your training data in a data frame and pass
svm
this as thedata
argument, and ii) supply a new data frame containingx
(fromtest
) topredict()
. E.g.: