CSV format for OpenCV machine learning algorithms

2019-07-01 19:02发布

Machine learning algorithms in OpenCV appear to use data read in CSV format. See for example this cpp file. The data is read into an OpenCV machine learning class CvMLData using the following code:

CvMLData data;
data.read_csv( filename ) 

However, there does not appear to be any readily available documentation on the required format for the csv file. Does anyone know how the csv file should be arranged?

Other (non-Opencv) programs tend to have a line per training example, and begin with an integer or string indicating the class label.

1条回答
来,给爷笑一个
2楼-- · 2019-07-01 19:45

If I read the source for that class, particularly the str_to_flt_elem function, and the class documentation I conclude that valid formats for individual items in the file are:

  1. Anything that can be parsed to a double by strod
  2. A question mark (?) or the empty string to represent missing values
  3. Any string that doesn't parse to a double.

Items 1 and 2 are only valid for features. anything matched by item 3 is assumed to be a class label, and as far as I can deduce the order of the items doesn't matter. The read_csv function automatically assigns each column in the csv file the correct type, and (if you want) you can override the labels with set_response_index. Delimiter wise you can use the default (,) or set it to whatever you like before calling read_csv with set_delimiter (as long as you don't use the decimal point).

So this should work for example, for 6 datapoints in 3 classes with 3 features per point:

A,1.2,3.2e-2,+4.1
A,3.2,?,3.1
B,4.2,,+0.2
B,4.3,2.0e3,.1
C,2.3,-2.1e+3,-.1
C,9.3,-9e2,10.4

You can move your text label to any column you want, or even have multiple text labels.

查看更多
登录 后发表回答