So this question may seem a little stupid but I couldn't wrap my head around it. What is the purpose of test data? Is it only to calculate accuracy of the classifier? I'm using Naive Bayes for sentiment analysis of tweets. Once I train my classifier using training data, I use test data just to calculate accuracy of the classifier. How can I use the test data to improve classifier's performance?
相关问题
- How to conditionally scale values in Keras Lambda
- Trying to understand Pytorch's implementation
- ParameterError: Audio buffer is not finite everywh
- How to calculate logistic regression accuracy
- How to parse unstructured table-like data?
相关文章
- How to use cross_val_score with random_state
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
- Invert MinMaxScaler from scikit_learn
- How should I vectorize the following list of lists
- ValueError: Unknown metric function when using cus
- F1-score per class for multi-class classification
You don't -- like you surmise, the test data is used for testing, and mustn't be used for anything else, lest you skew your accuracy measurements. This is an important cornerstone of any machine learning -- you only fool yourself if you use your test data for training.
If you are considering desperate measures like that, the proper way forward is usually to re-examine your problem space and the solution you have. Does it adequately model the problem you are trying to solve? If not, can you devise a better model which captures the essence of the problem?
Machine learning is not a silver bullet. It will not solve your problem for you. Too many failed experiments prove over and over again, "garbage in -- garbage out".
In doing general supervised machine learning, the test data set plays a critical role in determining how well your model is performing. You typically will build a model with say 90% of your input data, leaving 10% aside for testing. You then check the accuracy of that model by seeing how well it does against the 10% training set. The performance of the model against the test data is meaningful because the model has never "seen" this data. If the model be statistically valid, then it should perform well on both the training and test data sets. This general procedure is called cross validation and you can read more about it here.