I am trying to use cox model to predict the probability of failure after time (which is named stop) 3.
bladder1 <- bladder[bladder$enum < 5, ]
coxmodel = coxph(Surv(stop, event) ~ (rx + size + number) +
cluster(id), bladder1)
range(predict(coxmodel, bladder1, type = "lp"))
range(predict(coxmodel, bladder1, type = "risk"))
range(predict(coxmodel, bladder1, type = "terms"))
range(predict(coxmodel, bladder1, type = "expected"))
However, the outputs of predict function are all not in 0-1 range. Is there any function or how can I use the lp prediction and baseline hazard function to calculate probability?
Please read the help page for predict.coxph
. None of those are supposed to be probabilities. The linear predictor for a specific set of covariates is the log-hazard-ratio relative to a hypothetical (and very possibly non-existent) case with the mean of all the predictor values. The 'expected' comes the closest to a probability since it is a predicted number of events, but it would require specification of the time and then be divided by the number at risk at the beginning of observation.
In the case of the example offered on that help page for predict
, you can see that the sum of predicted events is close the the actual number:
> sum(predict(fit,type="expected"), na.rm=TRUE)
[1] 163
> sum(lung$status==2)
[1] 165
I suspect you may want to be working instead with the survfit
function, since the probability of event is 1-probability of survival.
?survfit.coxph
The code for a similar question appears here, although as with this question it was neither accepted nor upvoted, so one is left wondering if anybody cared: Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R
Since you suggested using the bladder1 dataset, then this would be the code for a specification of time=5
summary(survfit(coxmodel), time=5)
#------------------
Call: survfit(formula = coxmodel)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 302 26 0.928 0.0141 0.901 0.956
That would return as a list with the survival prediction as a list element named $surv
:
> str(summary(survfit(coxmodel), time=5))
List of 14
$ n : int 340
$ time : num 5
$ n.risk : num 302
$ n.event : num 26
$ conf.int: num 0.95
$ type : chr "right"
$ table : Named num [1:7] 340 340 340 112 NA 51 NA
..- attr(*, "names")= chr [1:7] "records" "n.max" "n.start" "events" ...
$ n.censor: num 19
$ surv : num 0.928
$ std.err : num 0.0141
$ lower : num 0.901
$ upper : num 0.956
$ cumhaz : num 0.0744
$ call : language survfit(formula = coxmodel)
- attr(*, "class")= chr "summary.survfit"
> summary(survfit(coxmodel), time=5)$surv
[1] 0.9282944