Machine Learning using R linear regression

2019-07-24 17:51发布

问题:

I used R for machine learning code. My project scenario as mentioned below. I used MongoDB for database storage. In mongo db I had one collection in that collection every 5 min. one new document added. The collection description as below.

 {
"_id" : ObjectId("521c980624c8600645ad23c8"),
"TimeStamp" : 1377605638752,
"cpuUsed" : -356962527,
"memory" : 2057344858,
"hostId" : "200.2.2.2"
 }

Now my problem is that using above documents I want to predict next 5 min or 10 min or 24 hrs. cpuUsed and memory values. For that I write R code as below

library('RMongo')
mg1 <- mongoDbConnect('dbname')
query <- dbGetQuery(mg1,'test',"{'hostId' : '200.2.2.2'}")
data1 <- query[]
cpu <- query$cpuUtilization
memory <- query$memory
new <- data.frame(data=1377678051) # set timestamp for calculating results
predict(lm(cpu ~   data1$memory + data1$Date ), new, interval="confidence")

But, when I was execute above code it shows me following output

           fit        lwr       upr
    1    427815904  -37534223 893166030
    2   -110791661 -368195697 146612374
    3    137889445 -135982781 411761671
    4   -165891990 -445886859 114102880
    .
    .
    .
    n    

Using this output I don't know which cpuUsed value used for predicting values. If any one knows please help me. Thank you.

回答1:

The newdata parameter of predict needs to contain the variables used in the fit:

new <- data.frame(memory = 1377678051, Date=as.Date("2013-08-28))

Only then it is actually used for prediction, otherwise you get the fitted values.

You can then cbind the predicted values with new.