I have a classifier that I trained using Python's scikit-learn. How can I use the classifier from a Java program? Can I use Jython? Is there some way to save the classifier in Python and load it in Java? Is there some other way to use it?
相关问题
- Delete Messages from a Topic in Apache Kafka
- how to define constructor for Python's new Nam
- Jackson Deserialization not calling deserialize on
- streaming md5sum of contents of a large remote tar
- How to maintain order of key-value in DataFrame sa
Here is some code for the JPMML solution:
--PYTHON PART--
--JAVA PART --
You can either use a porter, I have tested the sklearn-porter (https://github.com/nok/sklearn-porter), and it works well for Java.
My code is the following:
In my case, I'm using a DecisionTreeClassifier, and the output of
is the following code as text in the console:
I found myself in a similar situation. I'll recommend carving out a classifier microservice. You could have a classifier microservice which runs in python and then expose calls to that service over some RESTFul API yielding JSON/XML data-interchange format. I think this is a cleaner approach.
You cannot use jython as scikit-learn heavily relies on numpy and scipy that have many compiled C and Fortran extensions hence cannot work in jython.
The easiest ways to use scikit-learn in a java environment would be to:
expose the classifier as a HTTP / Json service, for instance using a microframework such as flask or bottle or cornice and call it from java using an HTTP client library
write a commandline wrapper application in python that reads data on stdin and output predictions on stdout using some format such as CSV or JSON (or some lower level binary representation) and call the python program from java for instance using Apache Commons Exec.
make the python program output the raw numerical parameters learnt at fit time (typically as an array of floating point values) and reimplement the predict function in java (this is typically easy for predictive linear models where the prediction is often just a thresholded dot product).
The last approach will be a lot more work if you need to re-implement feature extraction in Java as well.
Finally you can use a Java library such as Weka or Mahout that implement the algorithms you need instead of trying to use scikit-learn from Java.
There is JPMML project for this purpose.
First, you can serialize scikit-learn model to PMML (which is XML internally) using sklearn2pmml library directly from python or dump it in python first and convert using jpmml-sklearn in java or from a command line provided by this library. Next, you can load pmml file, deserialize and execute loaded model using jpmml-evaluator in your Java code.
This way works with not all scikit-learn models, but with many of them.