-->

R mlogit model, computationally singular

2020-03-25 15:19发布

问题:

I've spent the whole of today first battling with formatting my data (updated after finding a bug via BondedDust's table(TM) suggestion) appropriately for mLogit:

raw <-read.csv("C:\\Users\\Andy\\Desktop\\research\\Oxford\\Prefs\\rData.csv", header=T, row.names = NULL,id="id")
raw <-na.omit(raw)

library(mlogit)

TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.var = "dishId", chid.var = "individuals", drop.index = TRUE)

Where I fail is when trying to model my data.

model <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)

Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 6.26659e-18

I would really appreicate some help on the topic. Afraid I'm going a little bananas with it.

The data itself is from an experiment where we get 1000s of people to decide between pairs of plates of food (we vary how the food looks - either Angular or Circular - and vary how the plate is shaped - is either Angular or Circular).

With best wishes, Andy.

PS Afraid I'm a newbie with statistic Qs on StackOverflow.

回答1:

The model is unable to interpret your dishId as the alternative index (alt.var) because you have different keypairs for different choices. For example, you have "TS" and "RS" as alternative index keys for the first choice in your .csv file but you have "RR" and "RS" as keys for choice 3634. Additionally, you did also not specify the names of the alternatives (alt.levels). As a result of the fact that alt.levels is not filled in, mlogit.data will automatically try to detect the alternatives based upon the alternative index, which it cannot correctly interpret. This is basically where everything goes wrong: The 'food' and 'plate' variables are not interpreted as alternatives but they are considered as individual specific variables that eventually end up causing singularity issues.

You have two options to fix the issue. You can give the actual alternatives as input to mlogit.data through the alt.levels parameter:

TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.levels = c("food","plate"),chid.var = "individuals",drop.index=TRUE)
model1 <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)

Alternatively, you could opt to make your index keys consistent so that you can give them as input via alt.var. mlogit.data will now be able to correctly guess what your alternatives are:

raw[,3] <- rep(1:2,nrow(raw)/2) # use 1 and 2 as unique alternative keys for all choices
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.var="dishId", chid.var = "individuals")
model2 <- model <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)

We verify that both models are indeed identical. The results of model 1:

> summary(model1)

Call:
mlogit(formula = selected ~ food + plate | sex + age + hand, 
    data = TM, method = "nr", print.level = 0)

Frequencies of alternatives:
   food   plate 
0.42847 0.57153 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.00423 
successive function values within tolerance limits 

Coefficients :
                    Estimate Std. Error t-value  Pr(>|t|)    
plate:(intercept) -0.0969627  0.0764117 -1.2689 0.2044589    
foodCirc           1.0374881  0.0339559 30.5540 < 2.2e-16 ***
plateCirc         -0.0064866  0.0524547 -0.1237 0.9015835    
plate:sexmale     -0.0811157  0.0416113 -1.9494 0.0512512 .  
plate:age16-34     0.1622542  0.0469167  3.4583 0.0005435 ***
plate:age35-54     0.0312484  0.0555634  0.5624 0.5738492    
plate:age55-74     0.0556696  0.0836248  0.6657 0.5055987    
plate:age75+       0.1057646  0.2453797  0.4310 0.6664508    
plate:handright   -0.0177260  0.0539510 -0.3286 0.7424902    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -8284.6
McFadden R^2:  0.097398 
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)

Versus the results of model 2. Note that the alternatives are correctly identified, but the names are not explicitly added to the model:

> summary(model2)

Call:
mlogit(formula = selected ~ food + plate | sex + age + hand, 
    data = TM, method = "nr", print.level = 0)

Frequencies of alternatives:
      1       2 
0.42847 0.57153 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.00423 
successive function values within tolerance limits 

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)    
2:(intercept) -0.0969627  0.0764117 -1.2689 0.2044589    
foodCirc       1.0374881  0.0339559 30.5540 < 2.2e-16 ***
plateCirc     -0.0064866  0.0524547 -0.1237 0.9015835    
2:sexmale     -0.0811157  0.0416113 -1.9494 0.0512512 .  
2:age16-34     0.1622542  0.0469167  3.4583 0.0005435 ***
2:age35-54     0.0312484  0.0555634  0.5624 0.5738492    
2:age55-74     0.0556696  0.0836248  0.6657 0.5055987    
2:age75+       0.1057646  0.2453797  0.4310 0.6664508    
2:handright   -0.0177260  0.0539510 -0.3286 0.7424902    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -8284.6
McFadden R^2:  0.097398 
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)


回答2:

This is more a comment than an answer (I don't have anough rep point to comment!). However, I wasn't able to reproduce your code as there isn't any age column in your rData.csv.



回答3:

I had the same problem, was using

df.long <- mlogit.data(Train, choice = "Loan.Type" ,shape="wide")
mod3.rural1 <- mlogit( Loan.Type ~1sex+married+age+havejob+educ+political.afl+ethnicity+region+income+liquid.assets+class.of.HH, data= df.long, reflevel = "No.Loan")

yet the correlation between income and liquid assets was 0.50. But After removing outliers it worked find you van remove the outliers as follows

df <- df[-which(df$column %in% boxplot(df$column, plot=FALSE)$out),]

hope works for you,



标签: r mlogit