I am trying to model count data on the number of absence days by worker in a year (dependant variable). I have a set of predictors, including information about workers, about their job, etc..., and most of them are categorical variables. Consequently, there is a large number of coefficient to estimate (83), but as I have more than 600 000 rows, I think it should not be problematic. In addition, I have no missing values in my dataset.
My dependant variable contains lot of zero values, so I would like to estimate a zero inflated model (poisson or negative binomial), with the function zeroinfl
of the pscl
package, with the code:
zpoisson <- zeroinfl(formule,data=train,dist = "poisson",link="logit")
but I get the following erreur after a long running time:
Error in solve.default(as.matrix(fit$hessian)) : system is computationally singular: reciprocal condition number = 1.67826e-41
I think this error means some of my covariables are correlated, but it does not seem to be the case when checking pairwise correlation and Variance Inflation Factor (VIF). Moreover, I have also estimated other models like logit and Poisson or negative binomial count models, without problems whereas these types of models are also sensitive to correlated predictors.
Do you have an idea why the zeroinfl
function does not work? Could it be linked to the fact that I have too much predictors, even if they are not correlated? I have already tried to remove some predictors with the Boruta
algorithm, but it kept all of them.
Thanks in advance for your help.