I have an ARFF file containing 14 numerical columns. I want to perform a normalization on each column separately, that is modifying the values from each colum to (actual_value - min(this_column)) / (max(this_column) - min(this_column)
). Hence, all values from a column will be in the range [0, 1]
. The min and max values from a column might differ from those of another column.
How can I do this with Weka filters?
Thanks
Here is the working normalization example with K-Means in JAVA.
If you have CSV file then replace BufferedReader line above with below mentioned Datasource:
This can be done using
After applying this filter all values in each column will be in the range [0, 1]
That's right. Just wanted to remind about the difference of "normalization" and "standardization". What mentioned in the question is "standardization", while "normalization" assumes Gaussian distribution and normalizes by mean, and standard variation of each attribute. If you have an outlier in your data, the standardize filter might hurt your data distribution as the min, or max might be much farther than the other instances.