Get actual field name from JPMML model's Input

I have a scikit model that I'm using in my java app using JPMML. I'm trying to set the InputFields using the name of the column that was used during training, but "inField.getName().getValue()" is obfuscated to "x{#}". Is there anyway i could map "x{#}" back to the original feature/attribute name?

Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
    or (InputField inField : patternEvaluator.getInputFields()) {
        int value = activeFeatures.contains(inField.getName().getValue()) ? 1 : 0;
        FieldValue inputFieldValue = inField.prepare(value);
        arguments.put(inField.getName(), inputFieldValue);              
            }
Map<FieldName, ?> results = patternEvaluator.evaluate(arguments);

Here's how I'm generating the modal

from sklearn2pmml import PMMLPipeline
from sklearn2pmml import PMMLPipeline
import os
import pandas as pd
from sklearn.pipeline import Pipeline
import numpy as np

data = pd.read_csv('/pydata/training.csv')
X = data[data.keys()[:-1]].as_matrix()
y = data['classname'].as_matrix()

X_train, X_test, y_train, y_test =    train_test_split(X,y,test_size=0.3,random_state=0)

estimators = [("read", RandomForestClassifier(n_jobs=5,n_estimators=200, max_features='auto'))]    
pipe = PMMLPipeline(estimators)
pipe.fit(X_train,y_train)
pipe.active_fields = np.array(data.columns)
sklearn2pmml(pipe, "/pydata/model.pmml", with_repr = True)

Thanks

标签： java scikit-learn pmml

2条回答

\"骚年 ilove

2楼-- · 2019-07-24 17:44

Your pipeline only includes the estimator, that is why the names are lost. You have to include all the preprocessing steps as well in order to get them into the PMML.

Let's assume you do not do any preprocessing at all, then that is probably what you need (I do not repeat parts of your code which are required in this snippet):

nones = [(d, None) for d in data.columns]

mapper = DataFrameMapper(nones,df_out=True)

lm = PMMLPipeline([
    ("mapper", mapper),
    ("estimator", estimators)
])

lm.fit(X_train,y_train)

sklearn2pmml(lm, "ScikitLearnNew.pmml", with_repr=True)

In case you do require some preprocessing on your data, instead of None you can use any other transformator (e.g. LabelBinarizer). But the preprocessing has to be happening inside the pipeline in order to be included in the PMML.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-07-24 17:53

Does the PMML document contain actual field names at all? Open it in a text editor, and see what are the values of /PMML/DataDictionary/DataField@name attributes.

Your question indicates that the conversion from Scikit-Learn to PMML was incomplete, because it didn't include information about active field (aka input field) names. In that case they are assumed to be x1, x2, .., xn.

0人赞添加讨论(0) 举报

Get actual field name from JPMML model's Input

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间