I noticed there are two LinearRegressionModel
classes in SparkML, one in ML and another one in MLLib
package.
These two are implemented quite differently - e.g. the one from MLLib
implements Serializable
, while the other one does not.
By the way ame is true about RandomForestModel
.
Why is there two classes? Which is the "right" one? And is there a way to convert one into another?
o.a.s.mllib
contains old RDD-based API whileo.a.s.ml
contains new API build aroundDataset
and ML Pipelines.ml
andmllib
reached feature parity in 2.0.0 andmllib
is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.So unless your goal is backward compatibility then the "right choice" is
o.a.s.ml
.Spark Mllib
spark.mllib contains the legacy API built on top of RDDs.
Spark ML
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.
According to [the official announcement
More info read doc -https://spark.apache.org/docs/latest/ml-guide.html