What algorithm is used in spark decision tree (is

2020-04-03 03:59发布

I have a question about decision tree in MLlib. What algorithm is used in Spark? Is it ID3, C4.5 or CART?

标签： apache-spark tree

2条回答

别忘想泡老子

2楼-- · 2020-04-03 04:36

If you take a look at the link Apache Spark and take a look at the section,

Node impurity and information gain (Basic Algorithm)

You can find

The current implementation provides two impurity measures for classification (Gini impurity and entropy) and one impurity measure for regression (variance)

Also, if you take a look at the link Decision Tree, you can find CART (classification and regression tree) algorithm uses Gini impurity and entropy for classification and variance reduction for regression.

0人赞添加讨论(0) 举报

Emotional °昔

3楼-- · 2020-04-03 04:46

Spark MLlib is using the ID3 algorithm with CART.

ID3 only handles categorical variables and CART can handle continuous variables. Spark decision trees can handle categorical variables, so it is using CART (in the Jira ticket specified below we can see that they haven't implemented C4.5 yet).

In this blog post you can find some information about the different algorithms and it is where I got the answer from.

You can find a discussion on extending it to C4.5 in this Jira ticket.

More information about the difference between the algorithms here.

0人赞添加讨论(0) 举报

What algorithm is used in spark decision tree (is

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间