可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an ARFF file containing 14 numerical columns. I want to perform a normalization on each column separately, that is modifying the values from each colum to (actual_value - min(this_column)) / (max(this_column) - min(this_column)). Hence, all values from a column will be in the range [0, 1]. The min and max values from a column might differ from those of another column.

How can I do this with Weka filters?

Thanks

回答1:

This can be done using

weka.filters.unsupervised.attribute.Normalize

After applying this filter all values in each column will be in the range [0, 1]

回答2:

That's right. Just wanted to remind about the difference of "normalization" and "standardization". What mentioned in the question is "standardization", while "normalization" assumes Gaussian distribution and normalizes by mean, and standard variation of each attribute. If you have an outlier in your data, the standardize filter might hurt your data distribution as the min, or max might be much farther than the other instances.

回答3:

Here is the working normalization example with K-Means in JAVA.

final SimpleKMeans kmeans = new SimpleKMeans();

final String[] options = weka.core.Utils
        .splitOptions("-init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 10 -A \"weka.core.EuclideanDistance -R first-last\" -I 500 -num-slots 1 -S 50");
kmeans.setOptions(options);

kmeans.setSeed(10);
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(25);
kmeans.setMaxIterations(1000);

final BufferedReader datafile = new BufferedReader(new FileReader("/Users/data.arff");
Instances data = new Instances(datafile);

//normalize
final Normalize normalizeFilter = new Normalize();
normalizeFilter.setInputFormat(data);
data = Filter.useFilter(data, normalizeFilter);

//remove class column[0] from cluster
data.setClassIndex(0);
final Remove removeFilter = new Remove();
removeFilter.setAttributeIndices("" + (data.classIndex() + 1));
removeFilter.setInputFormat(data);
data = Filter.useFilter(data, removeFilter);

kmeans.buildClusterer(data);

System.out.println(kmeans.toString());

// evaluate clusterer
final ClusterEvaluation eval = new ClusterEvaluation();
eval.setClusterer(kmeans);
eval.evaluateClusterer(data);
System.out.println(eval.clusterResultsToString());

If you have CSV file then replace BufferedReader line above with below mentioned Datasource:

final DataSource source = new DataSource("/Users/data.csv");
final Instances data = source.getDataSet();

Weka normalizing columns

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Weka normalizing columns

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮