This is a scikit-learn error that I get when I do
my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)
Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.
Anyone has an idea on what's going on?
The error happens when I call
my_estimator.fit(x, y)
I have 40k
data points in 40
dimensions.
The full stack trace looks like this
File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit
axis=0)(all_alphas)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
y = self._evaluate(x)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
out_of_bounds = self._check_bounds(x_new)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds
raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.
There must be something particular to your data.
LassoLarsCV()
seems to be working correctly with this synthetic example of fairly well-behaved data:This is in sklearn 0.16, I don't have
positive=True
option.I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in
clf.alpha_
is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.Also, from the LassoLars documentation
so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.
P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.