Different results in Weka GUI and Weka via Java co

2020-04-12 08:55发布

I'm applying a text classification in Weka using NaiveBayesMultinomialText classifier. The problem is that when I use the GUI to do it and test on the same train data (without cross validation) I get 93% acurracy, and when I try do it via java code I get 67% acurracy. What might be wrong?

In GUI, I'm using the following configuration:

Lnorm 2.0
debug False
lowercaseTokens True
minWordFrequency 3.0
norm 1.0
normalizeDocLength False
periodicPruning 0
stemmer NullStemmer
stopwords pt-br-stopwords.dat
tokenizer NgramTokenizer (default parameters, but max ngramsize = 2)
useStopList True
useWordFrequencies True

And then I select "Use training set" in "Test options".

Now in java code I have:

        Instances train = readArff("data/naivebayestest/corpus_treino.arff");
        train.setClassIndex(train.numAttributes() - 1);
        NaiveBayesMultinomialText nb = new NaiveBayesMultinomialText();
        String opt = "-W -P 0 -M 5.0 -norm 1.0 -lnorm 2.0 -lowercase -stoplist -stopwords C:\\Users\\Fernando\\workspace\\GPCommentsAnalyzer\\pt-br_stopwords.dat -tokenizer \"weka.core.tokenizers.NGramTokenizer -delimiters ' \\r\\n\\t.,;:\\\'\\\"()?!\' -max 2 -min 1\" -stemmer weka.core.stemmers.NullStemmer";
        nb.setOptions(Utils.splitOptions(opt));                                            
        nb.buildClassifier(train);    

        Evaluation eval = new Evaluation(train);                                           
        eval.evaluateModel(nb, train);
        System.out.println(eval.toSummaryString());                                        
        System.out.println(eval.toClassDetailsString());                                   
        System.out.println(eval.toMatrixString());    

Probably I'm missing something in my java code.. Any ideas?

Thanks!

1条回答
何必那么认真
2楼-- · 2020-04-12 09:33

you can use bellow code for evaluation your classifier with 10CV:

eval.crossValidateModel(nb, train,10,new Random(1)); 

you should remember that don,t use train.Randomize and train.Stratify(10) before that.

查看更多
登录 后发表回答