I have modified the OneHotEncoder example to actually train a LogisticRegression. My question is how to map the generated weights back to the categorical variables?
def oneHotEncoderExample(sqlContext: SQLContext): Unit = {
val df = sqlContext.createDataFrame(Seq(
(0, "a", 1.0),
(1, "b", 1.0),
(2, "c", 0.0),
(3, "d", 1.0),
(4, "e", 1.0),
(5, "f", 0.0)
)).toDF("id", "category", "label")
df.show()
val indexer = new StringIndexer()
.setInputCol("category")
.setOutputCol("categoryIndex")
.fit(df)
val indexed = indexer.transform(df)
indexed.select("id", "categoryIndex").show()
val encoder = new OneHotEncoder()
.setInputCol("categoryIndex")
.setOutputCol("features")
val encoded = encoder.transform(indexed)
encoded.select("id", "features").show()
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.01)
val pipeline = new Pipeline()
.setStages(Array(indexer, encoder, lr))
// Fit the pipeline to training documents.
val pipelineModel = pipeline.fit(df)
val lorModel = pipelineModel.stages.last.asInstanceOf[LogisticRegressionModel]
println(s"LogisticRegression: ${(lorModel :LogisticRegressionModel)}")
// Print the weights and intercept for logistic regression.
println(s"Weights: ${lorModel.coefficients} Intercept: ${lorModel.intercept}")
}
Outputs
Weights: [1.5098946631236487,-5.509833649232324,1.5098946631236487,1.5098946631236487,-5.509833649232324] Intercept: 2.6679020381781235
I assume what you want here is an access the features metadata. Lets start with transforming existing
DataFrame
:Next you can extract metadata object:
Finally lets extract attributes:
These can be used to relate weights back to the original features.