Load svmlight format error

When I try to use the svmlight python package with data I already converted to svmlight format I get an error. It should be pretty basic, I don't understand what's happening. Here's the code:

import svmlight
training_data = open('thedata', "w")
model=svmlight.learn(training_data, type='classification', verbosity=0)

I've also tried:

training_data = numpy.load('thedata')

and

training_data = __import__('thedata')

标签： python import load format svmlight

1条回答

Deceive 欺骗

2楼-- · 2020-05-06 12:42

One obvious problem is that you are truncating your data file when you open it because you are specifying write mode "w". This means that there will be no data to read.

Anyway, you don't need to read the file like that if your data file is like the one in this example, you need to import it because it is a python file. This should work:

import svmlight
from data import train0 as training_data    # assuming your data file is named data.py
# or you could use __import__()
#training_data = __import__('data').train0

model = svmlight.learn(training_data, type='classification', verbosity=0)

You might want to compare your data against that of the example.

Edit after data file format clarified

The input file needs to be parsed into a list of tuples like this:

[(target, [(feature_1, value_1), (feature_2, value_2), ... (feature_n, value_n)]),
 (target, [(feature_1, value_1), (feature_2, value_2), ... (feature_n, value_n)]),
 ...
]

The svmlight package does not appear to support reading from a file in the SVM file format, and there aren't any parsing functions, so it will have to be implemented in Python. SVM files look like this:

<target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>

so here is a parser that converts from the file format to that required by the svmlight package:

def svm_parse(filename):

    def _convert(t):
        """Convert feature and value to appropriate types"""
        return (int(t[0]), float(t[1]))

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line.startswith('#'):
                line = line.split('#')[0].strip() # remove any trailing comment
                data = line.split()
                target = float(data[0])
                features = [_convert(feature.split(':')) for feature in data[1:]]
                yield (target, features)

And you can use it like this:

import svmlight

training_data = list(svm_parse('thedata'))
model=svmlight.learn(training_data, type='classification', verbosity=0)

0人赞添加讨论(0) 举报

Load svmlight format error

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间