I'm trying to use fitdist ()
function from the fitdistrplus
package to fit my data to different distributions. Let's say that my data looks like:
x = c (1.300000, 1.220000, 1.160000, 1.300000, 1.380000, 1.240000,
1.150000, 1.180000, 1.350000, 1.290000, 1.150000, 1.240000,
1.150000, 1.120000, 1.260000, 1.120000, 1.460000, 1.310000,
1.270000, 1.260000, 1.270000, 1.180000, 1.290000, 1.120000,
1.310000, 1.120000, 1.220000, 1.160000, 1.460000, 1.410000,
1.250000, 1.200000, 1.180000, 1.830000, 1.670000, 1.130000,
1.150000, 1.170000, 1.190000, 1.380000, 1.160000, 1.120000,
1.280000, 1.180000, 1.170000, 1.410000, 1.550000, 1.170000,
1.298701, 1.123595, 1.098901, 1.123595, 1.110000, 1.420000,
1.360000, 1.290000, 1.230000, 1.270000, 1.190000, 1.180000,
1.298701, 1.136364, 1.098901, 1.123595, 1.316900, 1.281800,
1.239400, 1.216989, 1.785077, 1.250800, 1.370000)
Next, if i run fitdist (x, "gamma")
everything is fine, but if I use fitdist (x, "beta")
instead I get the following error:
Error in start.arg.default(data10, distr = distname) :
values must be in [0-1] to fit a beta distribution
Ok, so I'm not native english but as far as I understand this method requires data to be in the range [0,1], so I scale it by using x_scaled = (x-min(x))/max(x)
. This gives me a vector with values in that range that perfectly correlates the original vector x
.
Because of x_scaled
is of class matrix
, I convert into a numeric vector using as.numeric()
. And then fit the model with fitdist(x_scale,"beta")
.
This time I get the following error:
Error in fitdist(x_scale, "beta") :
the function mle failed to estimate the parameters, with the error code 100
So after that I've been doing some search engine queries but I don't find anything useful. Does anybody ave an idea of whats going on wrong here? Thank you
By reading into the source code, it can be found that the default estimation method of
fitdist
ismle
, which will callmledist
from the same package, which will construct a negative log-likelihood for the distribution you have chosen and useoptim
orconstrOptim
to numerically minimize it. If there is anything wrong with the numerical optimization process, you get the error message you've got.It seems like the error occurs because when
x_scaled
contains 0 or 1, there will be some problem in calculating the negative log-likelihood for beta distribution, so the numerical optimization method will simply broke. One dirty trick is to letx_scaled <- (x - min(x) + 0.001) / (max(x) - min(x) + 0.002)
, so there is no 0 nor 1 inx_scaled
, andfitdist
will work.