I want to tunning my model with grid search and cross validation with spark. In the spark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression
as an base model, which can be new as an object. However, the RandomForest
model cannot be new by client code, so it seems not be able to use RandomForest
in the pipeline api. I don't want to recreate an wheel, so can anybody give some advice?
Thanks
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Livy Server: return a dataframe as JSON?
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Use of randomforest() for classification in R?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
- Why does the Spark DataFrame conversion to RDD req
Well, that is true but you simply trying to use a wrong class. Instead of
mllib.tree.RandomForest
you should useml.classification.RandomForestClassifier
. Here is an example based on the one from MLlib docs.There is one thing I couldn't figure out here. As far as I can tell it should be possible to use labels extracted from
LabeledPoints
directly, but for some reason it doesn't work andpipeline.fit
raisesIllegalArgumentExcetion
:Hence the ugly trick with
StringIndexer
. After applying we get required attributes ({"vals":["1.0","0.0"],"type":"nominal","name":"label"}
) but some classes inml
seem to work just fine without it.