I am using Apache Spark (Pyspark API for Python) ALS MLLIB to develop a service that performs live recommendations for anonym users (users not in the training set) in my site.
In my usecase I train the model on the User ratings in this way:
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
ratings = df.map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
rank = 10
numIterations = 10
model = ALS.trainImplicit(ratings, rank, numIterations)
Now, each time an anonym user selects an item in the catalogue, I want to fold-in its vector in the ALS model and get the recommendations (just like the recommendProducts() call), but avoiding the re-training of the whole model.
Is there any way to easily do the fold-in of the new anonym user vector after training the ALS model in Apache Spark?
Thanks in advance
There are a few Open Source "model server" solutions that I have seen advertised, but have no hands-on experience on. I also heard of a commercial offering, but can't just remember the name right now.
So make your own opinion, and keep a watch on possible alternatives.
PredictionIO (the start-up has been gobbled by SalesForce but their solution is still available) uses a Spark+Hadoop+HBase stack, plus some kind of web server component.
MLeap is yet-another-ML-library-with-limited-feature-set, which can be plugged into Spark/Scikit-Learn/whatever, and can spawn a web service -- or export your model to a hosted solution named Combust.ml
MLDB is yet-another-ML-library-with-limited-feature-set, completely outside of the Python/Spark ecosystem, but claims full integration with TensorFlow -- including the ability to import existing Deep Learning models and tweak them for different uses.