I am using libsvm for binary classification.. I wanted to try grid.py , as it is said to improve results.. I ran this script for five files in separate terminals , and the script has been running for more than 12 hours..
this is the state of my 5 terminals now :
[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".
[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
File "grid.py", line 223, in run
if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str
I had redirected the outputs to files , but those files for now contain nothing.. And , the following files were created :
- sbiz_nonbiz_feat.txt.out
- sbiz_nonbiz_feat.txt.png
- sarts_nonarts_feat.txt.out
- sarts_nonarts_feat.txt.png
- sgames_nongames_feat.txt.out
- sgames_nongames_feat.txt.png
- sref_nonref_feat.txt.out
- sref_nonref_feat.txt.png
- snews_nonnews_feat.txt.out (--> is empty )
There's just one line of information in .out files..
the ".png" files are some GNU PLOTS .
But i dont understand what the above GNUplots / warnings convey .. Should i re-run them ?
Can anyone please tell me on how much time this script might take if each input file contains about 144000 lines..
Thanks and regards
Change:
in line 223 in
grid.py
to:Also, try adding:
after this line in
grid.py
:This should fix your warnings and errors, but I am not sure if it will work, since
grid.py
seems to think your data has norate
.I guess
grid.py
is trying to find the optimal value for C (or Nu)?I don't have an answer for the amount of time it will take, but you might want to try this SVM library, even though it's an R package: svmpath.
As described on that page there, it will compute the entire "regularization path" for a two class SVM classifier in about as much time as it takes to train an SVM using one value of your penalty param C (or Nu).
So, instead of training and doing cross validation for an SVM with a value x for your C parameter, then doing all of that again for value x+1 for C, x+2, etc. You can just train the SVM once, then query its predictive performance for different values of C post-facto, so to speak.
The libSVM faq speaks to your question:
As a side note, you can parallelize your grid.py operations. The libSVM tools directory README file has this to say on the matter:
Parallel grid search
You can conduct a parallel grid search by dispatching jobs to a cluster of computers which share the same file system. First, you add machine names in grid.py:
ssh_workers = ["linux1", "linux5", "linux5"]
and then setup your ssh so that the authentication works without asking a password.
The same machine (e.g., linux5 here) can be listed more than once if it has multiple CPUs or has more RAM. If the local machine is the best, you can also enlarge the nr_local_worker. For example:
nr_local_worker = 2
In my Ubuntu 10.04 installation grid.py is actually /usr/bin/svm-grid.py
Your data is huge, 144 000 lines. So this will take sometime. I used large data such as yours and it took up to a week to finish. If you using images, which I suppose you are, hence the large data, try resizing your image before creating the data. You should get approximately the same results with your images resized.