I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Modified the second answer:
Inspired from Jared's answer, here is a version using a generator:
I am assuming that your data set
X
has N data points (= 4 in the example) and D features (= 2 in the example). The associated N labels are stored iny
.I've used both libraries and NLTK for naivebayes sklearn for crossvalidation as follows:
and at the end I calculated the average accuracy
Your options are to either set this up yourself or use something like NLTK-Trainer since NLTK doesn't directly support cross-validation for machine learning algorithms.
I'd recommend probably just using another module to do this for you but if you really want to write your own code you could do something like the following.
Supposing you want 10-fold, you would have to partition your training set into
10
subsets, train on9/10
, test on the remaining1/10
, and do this for each combination of subsets (10
).Assuming your training set is in a list named
training
, a simple way to accomplish this would be,Actually there is no need for a long loop iterations that are provided in the most upvoted answer. Also the choice of classifier is irrelevant (it can be any classifier).
Scikit provides cross_val_score, which does all the looping under the hood.