Reversing scaled values in LibSVM

2019-07-07 05:38发布

问题:

I am using Support Vector Regression for forecasting in LibSVM. I work it all. However there's one question that sticks in my head.

For LibSVM, I firstly scale my training and testing set in the same range and then select the optimal parameters. After I run svm-train and svm-predict, I get the forecasted values for testing set in a scaled format. I am using Excel and reverse the scaling and calculate Mean Absolute Percentage Error (MAPE).

I a pretty sure that scaling in LibSVM works like that subtracting the value from the minimum and then dividing by the range for particular feature. However, I wanted to see whether the values I scaled by hand and the values scaled by LibSVM are the same. Before I divide the dataset into two sets, I find minimum and maximum of values in the feature and then do the scaling in the way I said above. However the scaling values for training and testing sets that LibSVM gives are not exactly the same with the ones I calculate by hand. They are just roughly close. Does anyone know why they are not the same?

Another question is that: How can I calculate MAPE in LibSVM?

回答1:

If you inspect the file svm-scale.c you will find that the formula that scales data is:

value = y_lower + (y_upper-y_lower) * (value - y_min)/(y_max-y_min);

Where y_lower y_upper are y scaling limits

So as you can see the scaled value is not worked out as you were supposing "subtracting the value from the minimum and then dividing by the range for particular feature". If you want to recover the real value you only have to undo the formula.

Example:

If you take one the many datasets that are available in the libSVM site as examples, such as this one: covtype dataset, and you open it, you will see a file such this one:

1 1:2596 2:51 3:3 4:258 6:510 7:221 8:232 9:148 10:6279 11:1 43:1
1 1:2590 2:56 3:2 4:212 5:-6 6:390 7:220 8:235 9:151 10:6225 11:1 43:1
2 1:2804 2:139 3:9 4:268 5:65 6:3180 7:234 8:238 9:135 10:6121 11:1 26:1
2 1:2785 2:155 3:18 4:242 5:118 6:3090 7:238 8:238 9:122 10:6211 11:1 44:1
1 1:2595 2:45 3:2 4:153 5:-1 6:391 7:220 8:234 9:150 10:6172 11:1 43:1
...

Now let's scale it using:

./svm-scale -s covtype.libsvm.binary.range  covtype.libsvm.binary > covtype.libsvm.binary.scale

This will generate two files, the .range file will contain all the information related to the scale process (max and min per column), and the .scale file which is the output, that will look like:

1 1:-0.262631 2:-0.716667 3:-0.909091 4:-0.630637 5:-0.552972 6:-0.856681 7:0.740157 8:0.826772 9:0.165354 10:0.750732 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
1 1:-0.268634 2:-0.688889 3:-0.939394 4:-0.696492 5:-0.568475 6:-0.890403 7:0.732283 8:0.850394 9:0.188976 10:0.735675 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
2 1:-0.0545273 2:-0.227778 3:-0.727273 4:-0.616321 5:-0.385013 6:-0.106365 7:0.84252 8:0.874016 9:0.0629921 10:0.706678 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
2 1:-0.0735368 2:-0.138889 3:-0.454545 4:-0.653543 5:-0.248062 6:-0.131657 7:0.874016 8:0.874016 9:-0.0393701 10:0.731772 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
1 1:-0.263632 2:-0.75 3:-0.939394 4:-0.780959 5:-0.555556 6:-0.890122 7:0.732283 8:0.84252 9:0.181102 10:0.720898 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
...

The .range file looks like:

x
-1 1
1 1859 3858
2 0 360
3 0 66
4 0 1397
...

So taking into account that y_lower = -1 and y_upper = 1 you can verify for the first element 2596 the conversion:

value = -1 + (1 - (-1)) * (2596 - 1859) / (3858 - 1859) = -0.26263131565782893

Which is the expected value :)

Tip:

Normally you scale your training set with svm-scale, get your model (using k-fold cross validation) and finally performing testing scaling data with the values (y_max and y_min) obtained from training. You can see the process in the file tools/easy.py.



标签: scale libsvm