how much time does grid.py take to run?

2019-05-31 11:06发布

问题:

I am using libsvm for binary classification.. I wanted to try grid.py , as it is said to improve results.. I ran this script for five files in separate terminals , and the script has been running for more than 12 hours..

this is the state of my 5 terminals now :

[root@localhost tools]# python grid.py sarts_nonarts_feat.txt>grid_arts.txt
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [61.3997:61.3997], adjusting to [60.7857:62.0137]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sgames_nongames_feat.txt>grid_games.txt
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [64.5867:64.5867], adjusting to [63.9408:65.2326]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sref_nonref_feat.txt>grid_ref.txt
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [62.4602:62.4602], adjusting to [61.8356:63.0848]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py sbiz_nonbiz_feat.txt>grid_biz.txt
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 2: warning: Cannot contour non grid data. Please use "set dgrid3d".
Warning: empty z range [67.9762:67.9762], adjusting to [67.2964:68.656]
         line 4: warning: Cannot contour non grid data. Please use "set dgrid3d".

[root@localhost tools]# python grid.py snews_nonnews_feat.txt>grid_news.txt
Wrong input format at line 494
Traceback (most recent call last):
  File "grid.py", line 223, in run
    if rate is None: raise "get no rate"
TypeError: exceptions must be classes or instances, not str

I had redirected the outputs to files , but those files for now contain nothing.. And , the following files were created :

  • sbiz_nonbiz_feat.txt.out
  • sbiz_nonbiz_feat.txt.png
  • sarts_nonarts_feat.txt.out
  • sarts_nonarts_feat.txt.png
  • sgames_nongames_feat.txt.out
  • sgames_nongames_feat.txt.png
  • sref_nonref_feat.txt.out
  • sref_nonref_feat.txt.png
  • snews_nonnews_feat.txt.out (--> is empty )

There's just one line of information in .out files..
the ".png" files are some GNU PLOTS .

But i dont understand what the above GNUplots / warnings convey .. Should i re-run them ?

Can anyone please tell me on how much time this script might take if each input file contains about 144000 lines..

Thanks and regards

回答1:

Your data is huge, 144 000 lines. So this will take sometime. I used large data such as yours and it took up to a week to finish. If you using images, which I suppose you are, hence the large data, try resizing your image before creating the data. You should get approximately the same results with your images resized.



回答2:

The libSVM faq speaks to your question:

Q: Why grid.py/easy.py sometimes generates the following warning message? Warning: empty z range [62.5:62.5], adjusting to [61.875:63.125] Notice: cannot contour non grid data! Nothing is wrong and please disregard the message. It is from gnuplot when drawing the contour.

As a side note, you can parallelize your grid.py operations. The libSVM tools directory README file has this to say on the matter:

Parallel grid search

You can conduct a parallel grid search by dispatching jobs to a cluster of computers which share the same file system. First, you add machine names in grid.py:

ssh_workers = ["linux1", "linux5", "linux5"]

and then setup your ssh so that the authentication works without asking a password.

The same machine (e.g., linux5 here) can be listed more than once if it has multiple CPUs or has more RAM. If the local machine is the best, you can also enlarge the nr_local_worker. For example:

nr_local_worker = 2

In my Ubuntu 10.04 installation grid.py is actually /usr/bin/svm-grid.py



回答3:

I guess grid.py is trying to find the optimal value for C (or Nu)?

I don't have an answer for the amount of time it will take, but you might want to try this SVM library, even though it's an R package: svmpath.

As described on that page there, it will compute the entire "regularization path" for a two class SVM classifier in about as much time as it takes to train an SVM using one value of your penalty param C (or Nu).

So, instead of training and doing cross validation for an SVM with a value x for your C parameter, then doing all of that again for value x+1 for C, x+2, etc. You can just train the SVM once, then query its predictive performance for different values of C post-facto, so to speak.



回答4:

Change:

if rate is None: raise "get no rate"

in line 223 in grid.py to:

if rate is None: raise ValueError("get no rate")

Also, try adding:

gnuplot.write("set dgrid3d\n")

after this line in grid.py:

gnuplot.write("set contour\n")

This should fix your warnings and errors, but I am not sure if it will work, since grid.py seems to think your data has no rate.