I passed two streams of data to sgd_clf classifier as shown in below code. First partial_fit is taking first stream of data x1,y1. Second partial_fit is taking the second stream of data x2,y2.
The below code gives me error at second partial_fit step that class lables to be included prior. This error is gone when i include all my data from x2 y2 in x1, y1. (My class labels are included prior to calling second partial_fit now)
However, i cannot give this x2 y2 data prior. If at all i give all my data before first partial_fit(), why is there any need for me to use second partial_fit() ? Infact, if i know all data before, i dont need to use partial_fit(), i could just do fit().
from sklearn import neighbors, linear_model
import numpy as np
def train_new_data():
sgd_clf = linear_model.SGDClassifier()
x1 = [[8, 9], [20, 22]]
y1 = [5, 6]
classes = np.unique(y1)
#print(classes)
sgd_clf.partial_fit(x1,y1,classes=classes)
x2 = [10, 12]
y2 = 8
sgd_clf.partial_fit([x2], [y2],classes=classes)#Error here!!
return sgd_clf
if __name__ == "__main__":
print(train_new_data().predict([[20,22]]))
Q1: Is my understanding of partial_fit() for sklearn classifiers wrong that it takes data on the fly as specified here: Incremental Learning
Q2: I want to retrain a model/update a model with the new data. I dont want to train from scratch. Will partial_fit help me with this ?
Q3: I am not specific only to SGDClassifier. I can use any algorithm that support online/batch learning. My main intention is Q3. I have a trained model on 1000's of images. I dont want to retrain this model from scratch just because i have one/two new samples of images. Neither interested in creating a new model for each new entry and then mix all of them. This decreases my performance for predictions to search all over the trained models. I just want to add this new data instances to the trained model with the help of partial_fit. Is this feasible ?
Q4: If i cannot acheive Q2 with scikit classifiers, Please direct me how i can achieve this
Any suggestions or ideas or references are much appreciated.