Good afternoon,
I could post reproducible code and certainly will if everyone agrees that something is wrong, but right now I think my question is quite simple and someone will point me the right path.
I am working in a data set like this:
created_as_free_user t c
<fctr> <int> <int>
1 true 36 0
2 true 36 0
3 true 0 1
4 true 28 0
5 true 9 0
6 true 0 1
7 true 13 0
8 true 19 0
9 true 9 0
10 true 16 0
I fitted a Cox Regression model like this:
fit_train = coxph(Surv(time = t,event = c) ~ created_as_free_user ,data = teste)
summary(fit_train)
And received:
Call:
coxph(formula = Surv(time = t, event = c) ~ created_as_free_user,
data = teste)
n= 9000, number of events= 1233
coef exp(coef) se(coef) z Pr(>|z|)
created_as_free_usertrue -0.7205 0.4865 0.1628 -4.426 9.59e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
created_as_free_usertrue 0.4865 2.055 0.3536 0.6693
Concordance= 0.511 (se = 0.002 )
Rsquare= 0.002 (max possible= 0.908 )
Likelihood ratio test= 15.81 on 1 df, p=7e-05
Wald test = 19.59 on 1 df, p=9.589e-06
Score (logrank) test = 20.45 on 1 df, p=6.109e-06
So far so good. Next step: Predict the results on new data. I understand the different types of predictions that predict.coxph can give me (or at least I think I do). Let's use the type = "lp":
head(predict(fit_train,validacao,type = "lp"),n=20)
And get:
1 2 3 4 5 6 7 8 9 10
-0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854
11 12 13 14 15 16 17 18 19 20
-0.01208854 -0.01208854 0.70842049 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854
OK. But when I look at the data that I am trying to estimate:
# A tibble: 9,000 × 3
created_as_free_user t c
<fctr> <int> <int>
1 true 20 0
2 true 12 0
3 true 0 1
4 true 10 0
5 true 51 0
6 true 36 0
7 true 44 0
8 true 0 1
9 true 27 0
10 true 6 0
# ... with 8,990 more rows
It makes me confuse....
The type = "lp" isn't suppose to give you the linear predictions? For this data above that I am trying to estimate, since the created_as_free_user variable is equal to true, Am I wrong expecting the type = "lp" prediction to be exactaly -0.7205 (the coefficient of the model above)? Where does the -0.01208854 came? I suspect it's some sort of scale situation, but couldn't find the answer online.
My final objective is the h(t) that is given by the prediction type = "expected", but I am not all that comfortable using it because it uses this -0.01208854 value that I don't fully understand.
Thanks a lot
The Details section in
?predict.coxph
reads:To illustrate what this means, we can look at a simple example. Some fake data:
We fit a model and view predictions:
The predictions are the same as:
(Using this logic and some arithmetic we can back out from your results that you have 8849 "trues" out of 9000 observations for your
created_as_free_user
variable.)