I do not understand why Weka Evaluation class constructor needs the train instances to work.
can anybody explain me?
In theory, the evaluation depends only of the trained model (cls in the next code) and test data (TestingSet).
Thanks!
This is an example:
// TrainingSet is the training Instances
// TestingSet is the testingInstances
// Build de classifier
Classifier cls = (Classifier) new NaiveBayes();
cls.buildClassifier(TrainingSet);
// Test the model
Evaluation eTest = new Evaluation(**TrainingSet**);
eTest.evaluateModel(cls, TestingSet);
Most of the algorithms work on numeric data. So all the
non-numeric
values of a feature have to converted into a numeric form. This mapping has to be unique. What this means is that all the values which have a specific non-numeric value will be mapped to the same numeric value.While training the data, the data pre-processor sees the data for the very first time. While converting the non-numeric data the pre-processor uses
maps
to remember the mapping.For e.g. If all possible values for a feature are {yes, no, maybe} then these values could be mapped like :
{"yes":1, "no":2, "maybe":3}
So, the input feature/column which looked like
[yes,yes,no,yes,maybe,yes]
would now be converted into an internal form of[1,1,2,1,3,1]
. These numeric values are used by the algorithms.Now this information is stored in Instances(trained) in Weka. So when the evaluator predicts a numeric value for a feature it needs to convert this numeric value to its actual value.
i.e. If the algo spits out a value of 2 it needs the map to figure out that 2 corresponds to 'no'. To do this the algorithm needs the mapping created before training. Hence it asks for training Instances.
Note : AFAIK same logic applies in all ML frameworks like weka, dl4j, etc.
From UMass Boston Computer Science Documentation on Weka :
You can take a look at the constructor source here.
I have one posible solution to my own question. I was looking the way to evaluate a test file using a classifier model previously trained and saved in a file. The Evaluation class does not work for me because it needs the train data in the constructor. But it can be used the method classifyInstance of the classifier.
The next code is an example: