I have got a question regarding the ordinal
package in R
or specifically regarding the predict.clm()
function. I would like to calculate the linear predictor
of an ordered probit
estimation. With the polr function of the MASS package the linear predictor can be accessed by object$lp. It gives me on value for each line and is in line with what I understand what the linear predictor is namely X_i'beta. If I however use the predict.clm(object, newdata,"linear.predictor")
on an ordered probit estimation with clm() I get a list with the elements eta1 and eta2,
- with one column each, if the newdata contains the dependent variable
- where each element contains as many columns as levels in the dependent variable, if the newdata doesn't contain the dependent variable
Unfortunately I don't have a clue what that means. Also in the documentations and papers of the author I don't find any information about it. Would one of you be so nice to enlighten me? This would be great.
Cheers,
AK
UPDATE (after comment):
Basic clm
model is defined like this (see clm tutorial for details):
Generating data:
library(ordinal)
set.seed(1)
test.data = data.frame(y=gl(4,5),
x=matrix(c(sample(1:4,20,T)+rnorm(20), rnorm(20)), ncol=2))
head(test.data) # two independent variables
test.data$y # four levels in y
Constructing models:
fm.polr <- polr(y ~ x) # using polr
fm.clm <- clm(y ~ x) # using clm
Now we can access thetas
and betas
(see formula above):
# Thetas
fm.polr$zeta # using polr
fm.clm$alpha # using clm
# Betas
fm.polr$coefficients # using polr
fm.clm$beta # using clm
Obtaining linear predictors (only parts without theta
on the right side of the formula):
fm.polr$lp # using polr
apply(test.data[,2:3], 1, function(x) sum(fm.clm$beta*x)) # using clm
New data generation:
# Contains only independent variables
new.data <- data.frame(x=matrix(c(rnorm(10)+sample(1:4,10,T), rnorm(10)), ncol=2))
new.data[1,] <- c(0,0) # intentionally for demonstration purpose
new.data
There are four types of predictions available for clm
model. We are interested in type=linear.prediction
, which returns a list with two matrices: eta1
and eta2
. They contain linear predictors for each observation in new.data
:
lp.clm <- predict(fm.clm, new.data, type="linear.predictor")
lp.clm
Note 1: eta1
and eta2
are literally equal. Second is just a rotation of eta1
by 1 in j
index. Thus, they leave left side and right side of linear predictor scale opened respectively.
all.equal(lp.clm$eta1[,1:3], lp.clm$eta2[,2:4], check.attributes=FALSE)
# [1] TRUE
Note 2: Prediction for first line in new.data
is equal to thetas
(as far as we set this line to zeros).
all.equal(lp.clm$eta1[1,1:3], fm.clm$alpha, check.attributes=FALSE)
# [1] TRUE
Note 3: We can manually construct such predictions. For instance, prediction for second line in new.data
:
second.line <- fm.clm$alpha - sum(fm.clm$beta*new.data[2,])
all.equal(lp.clm$eta1[2,1:3], second.line, check.attributes=FALSE)
# [1] TRUE
Note 4: If new.data
contains response variable, then predict
returns only linear predictor for specified level of y
. Again we can check it manually:
new.data$y <- gl(4,3,length=10)
lp.clm.y <- predict(fm.clm, new.data, type="linear.predictor")
lp.clm.y
lp.manual <- sapply(1:10, function(i) lp.clm$eta1[i,new.data$y[i]])
all.equal(lp.clm.y$eta1, lp.manual)
# [1] TRUE