I want to run a multinomial logit in R and have used two libraries, nnet
and mlogit
, which produce different results and report different types of statistics. My questions are:
What is the source of discrepency between the coefficients and standard errors reported by
nnet
and those reported bymlogit
?I would like to report my results to a
Latex
file usingstargazer
. When doing so, there is a problematic tradeoff:If I use the results from
mlogit
then I get the statistics I wish, such as psuedo R squared, however, the output is in long format (see example below).If I use the results from
nnet
then the format is as expected, but it reports statistics that I am not interested in such as AIC, but does not include, for example, psuedo R squared.
I would like to have the statistics reported by
mlogit
in the formatting ofnnet
when I usestargazer
.
Here is a reproducible example, with three choice alternatives:
library(mlogit)
df = data.frame(c(0,1,1,2,0,1,0), c(1,6,7,4,2,2,1), c(683,276,756,487,776,100,982))
colnames(df) <- c('y', 'col1', 'col2')
mydata = df
mldata <- mlogit.data(mydata, choice="y", shape="wide")
mlogit.model1 <- mlogit(y ~ 1| col1+col2, data=mldata)
The tex output when compiled is of what I refer to as "long format" which I deem undesired:
Now, using nnet
:
library(nnet)
mlogit.model2 = multinom(y ~ 1 + col1+col2, data=mydata)
stargazer(mlogit.model2)
Gives the tex output:
which is of the "wide" format which I desire. Note the different coefficient and standard errors.
To my knowledge, there are three R packages that allow the estimation of the multinomial logistic regression model:
mlogit
,nnet
andglobaltest
(from Bioconductor). I do not consider here themnlogit
package, a faster and more efficient implementation ofmlogit
.All the above packages use different algorithms that, for small samples, give different results. These differencies vanishes for moderate sample sizes (try with
n <- 100
).Consider the following data generating process was taken from the James Keirstead's blog:
The model parameters estimated by the three packages are respectively:
The
mlogit
command ofglobaltest
fits the model without using a reference outcome category, hence the usual parameters can be calculated as follows:Concerning the estimation of the parameters in the three packages, the method used in
mlogit::mlogit
is explained in detail here.In
nnet::multinom
the model is a neural network with no hidden layers, no bias nodes and a softmax output layer; in our case there are 3 input units and 3 output units:Maximum conditional likelihood is the method used in
multinom
for model fitting.The parameters of multinomial logit models are estimated in
globaltest::mlogit
using maximum likelihood and working with an equivalent log-linear model and the Poisson likelihood. The method is described here.For models estimated by
multinom
the McFadden's pseudo R-squared can be easily calculated as follows:At this point, using
stargazer
, I generate a report for the model estimated bymlogit::mlogit
which is as similar as possible to the report ofmultinom
.The basic idea is to substitute the estimated coefficients and probabilities in the object created by
multinom
with the corresponding estimates ofmlogit
.Here is the result:
Now I am working on the last issue: how to visualize loglik, pseudo R2 and other information in the above
stargazer
output.If you are using stargazer you can use
omit
to remove unwanted rows or references. Here is a quick example, hopefully, it will point you int he right direction.nb. My assumption is you are using Rstudio and rmarkdown with knitr.
Result:
Note that the second image omits 1:col1, 2:col2, 1:col2 and 2:col2