What is a Learning Curve in machine learning?

2020-02-16 06:56发布

I want to know what a learning curve in machine learning is. What is the standard way of plotting it? I mean what should be the x and y axis of my plot?

9条回答
我只想做你的唯一
2楼-- · 2020-02-16 07:08

It usually refers to a plot of the prediction accuracy/error vs. the training set size (i.e: how better does the model get at predicting the target as you the increase number of instances used to train it)

learning-curve

Usually both the training and test/validation performance are plotted together so we can diagnose the bias-variance tradeoff (i.e determine if we benefit from adding more training data, and assess the model complexity by controlling regularization or number of features).

bias-variance

查看更多
祖国的老花朵
3楼-- · 2020-02-16 07:13

Basically, a machine learning curve allows you to find the point from which the algorithm starts to learn. If you take a curve and then slice a slope tangent for derivative at the point that it starts to reach constant is when it starts to build its learning ability.

Depending on how your x and y axis are mapped, one of your axis will start to approach a constant value while the other axis's values will keep increasing. This is when you start seeing some learning. The whole curve pretty much allows you to measure the rate at which your algorithm is able to learn. The maximum point is usually when the slope starts to recede. You can take a number of derivative measures to the maximum/minimum point.

So from the above examples you can see that the curve is gradually tending towards a constant value. It initially starts to harness its learning through the training examples and the slope widens at maximum/mimimum point where it tends to approach closer and closer towards the constant state. At this point it is able to pick up new examples from test data and find new and unique results from data. You would have such x/y axis measures for epochs vs error.

查看更多
够拽才男人
4楼-- · 2020-02-16 07:15

Example X= Level y=salary

X Y 0 2000 2 4000 4 6000 6 8000

Regression gives accuracy 75% it is a state line polynomial gives accuracy 85% because of the curveenter image description here

查看更多
SAY GOODBYE
5楼-- · 2020-02-16 07:20

I just want to leave a brief note on this old question to point out that learning curve and ROC curve are not synonymous.

As indicated in the other answers to this question, a learning curve conventionally depicts improvement in performance on the vertical axis when there are changes in another parameter (on the horizontal axis), such as training set size (in machine learning) or iteration/time (in both machine and biological learning). One salient point is that many parameters of the model are changing at different points on the plot. Other answers here have done a great job of illustrating learning curves.

(There is also another meaning of learning curve in industrial manufacturing, originating in an observation in the 1930s that the number of labor hours needed to produce an individual unit decreases at a uniform rate as the quantity of units manufactured doubles. It isn't really relevant but is worth noting for completeness and to avoid confusion in web searches.)

In contrast, Receiver Operating Characteristic curve, or ROC curve, does not show learning; it shows performance. An ROC curve is a graphical depiction of classifier performance that shows the trade-off between increasing true positive rates (on the vertical axis) and increasing false positive rates (on the horizontal axis) as the discrimination threshold of the classifier is varied. Thus, only a single parameter (the decision / discrimination threshold) associated with the model is changing at different points on the plot. This ROC curve (from Wikipedia) shows performance of three different classifiers.

ROC curve, see previous link for CC licensing

There is no learning being depicted here, but rather performance with respect to two different classes of success/error as the classifier's decision threshold is made more lenient/strict. By looking at the area under the curve, we can see an overall indication of the ability of the classifier to distinguish the classes. This area-under-the-curve metric is insensitive to the number of members in the two classes, so it may not reflect actual performance if class membership is unbalanced. The ROC curve has many subtitles and interested readers might check out:

Fawcett, Tom. "ROC graphs: Notes and practical considerations for researchers." Machine Learning 31 (2004): 1-38.

Swets, John A., Robyn M. Dawes, and John Monahan. "Better decisions through Science." Scientific American (2000): 83.

查看更多
叛逆
6楼-- · 2020-02-16 07:25

use this code to plot :

# Loss Curves
plt.figure(figsize=[8,6])
plt.plot(history.history['loss'],'r',linewidth=3.0)
plt.plot(history.history['val_loss'],'b',linewidth=3.0)
plt.legend(['Training loss', 'Validation Loss'],fontsize=18)
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.title('Loss Curves',fontsize=16)

# Accuracy Curves
plt.figure(figsize=[8,6])
plt.plot(history.history['acc'],'r',linewidth=3.0)
plt.plot(history.history['val_acc'],'b',linewidth=3.0)
plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18)
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Accuracy',fontsize=16)
plt.title('Accuracy Curves',fontsize=16)

note that history = model.fit(...)

查看更多
等我变得足够好
7楼-- · 2020-02-16 07:29

How can you determine for a given model whether more training points will be helpful? A useful diagnostic for this are learning curves.

• Plot of the prediction accuracy/error vs. the training set size (i.e.: how better does the model get at predicting the target as you the increase number of instances used to train it)

• Learning curve conventionally depicts improvement in performance on the vertical axis when there are changes in another parameter (on the horizontal axis), such as training set size (in machine learning) or iteration/time

• A learning curve is often useful to plot for algorithmic sanity checking or improving performance

• Learning curve plotting can help diagnose the problems your algorithm will be suffering from

Personally, the below two links helped me to understand better about this concept

Learning Curve

Sklearn Learning Curve

查看更多
登录 后发表回答