Assesing the goodness of fit for the multinomial l

2020-06-24 06:28发布

I use the multinom() function from the nnet package to run the multinomial logistic regression in R. The nnet package does not include p-value calculation and t-statistic calculation. I found a way to calculate the p-values using the two tailed z-test from this page. To give one example of calculating a test statistic for a multinom logit (not really a t-stat, but an equivalent) I calculate the Wald's statistic:

mm<-multinom(Empst ~ Agegroup + Marst + Education + State, 
             data = temp,weight=Weight)
W <- (summary(mm1)$coefficients)^2/(summary(mm1)$standard.errors)^2

I take the square of a coefficient and divide by the square of the coefficient's standard error. However, the likelihood-ratio test is the preferable measure of a goodness of fit for the logistic regressions. I do not know how to write code that will calculate the likelihood ratio statistic for each coefficient due to the incomplete understanding of the likelihood function. What would be the way to calculate the likelihood-ratio statistic for each coefficient using the output from the multinom() function? Thanks for your help.

3条回答
在下西门庆
2楼-- · 2020-06-24 06:49

Let's look at predicting Sepal.Length from the iris dataset using Species (a categorical variable) and Petal.Length (a continuous variable). Let's start by converting our factor variable into multiple binary variables using model.matrix and building our neural network:

library(nnet)
data(iris)
mat <- as.data.frame(model.matrix(~Species+Petal.Length+Sepal.Length, data=iris))
mm <- multinom(Sepal.Length~.+0, data=mat, trace=F)

Now we can run a likelihood ratio test for a variable in our model:

library(lmtest)
lrtest(mm, "Speciesversicolor")
# Likelihood ratio test
# 
# Model 1: Sepal.Length ~ `(Intercept)` + Speciesversicolor + Speciesvirginica + 
#     Petal.Length + 0
# Model 2: Sepal.Length ~ `(Intercept)` + Speciesvirginica + Petal.Length - 
#     1
#   #Df  LogLik  Df  Chisq Pr(>Chisq)
# 1 136 -342.02                      
# 2 102 -346.75 -34 9.4592          1

To run the likelihood ratio test for all your variables, I guess you could just use a loop and run for each variable name. I've extracted just the p-values in this loop.

for (var in mm$coefnames[-1]) {
  print(paste(var, "--", lrtest(mm, var)[[5]][2]))
}
# [1] "Speciesversicolor -- 0.999990077592342"
# [1] "Speciesvirginica -- 0.998742545590864"
# [1] "Petal.Length -- 3.36995663002528e-14"
查看更多
Fickle 薄情
3楼-- · 2020-06-24 06:51

Use the Anova function in the car package for the likelihood-ratio test of each term in your model.

library(nnet)
data(iris)


mm <- multinom(Species ~ ., data=iris, trace=F)

### car package
library(car)
Anova(mm)
查看更多
beautiful°
4楼-- · 2020-06-24 07:00

From the response of @jolisber i extracted a function so anyone can do this and store the values in a df. Well, i stored the full character vector in the df.

likehoodmultinom2 <- function(model_lmm) 
{

  i <- 1
  values<- c("No funciona") 

  for (var in model_lmm$coefnames[-1]) { # Qutiamos el -1 de coefnames para no obener un NA

  values[i] =(paste(var, "--", lrtest(model_lmm, var)[[5]][2]))
  i=i+1

  }
  return (values)
}

However i cant get the first element (variable) p-value. I dont know why. And i cant ignore the [-1] in model_lmm$coefnames. EDITED. I edited i=0 to i=1; forgot that R vectors start at that :D.

Hope this works for everyone :D

EDIT 2

Also did 1 so it can store in a df.

likehoodmultinom_p <- function(model_lmm) 
{

  i <- 1

  variables <-c("No funciona")
  values <- c("No funciona") 


  for (var in model_lmm$coefnames[-1]) { 

  variables[i] =paste(var)
  values[i]= lrtest(model_lmm, var)[[5]][2]
  i=i+1
   ## Contributed to stack at: 
  }
  return (data.frame(variables,values))
}
查看更多
登录 后发表回答