Suppose you are training a custom tf.estimator.Estimator
with tf.estimator.train_and_evaluate
using a validation dataset in a setup similar to that of @simlmx's:
classifier = tf.estimator.Estimator(
model_fn=model_fn,
model_dir=model_dir,
params=params)
train_spec = tf.estimator.TrainSpec(
input_fn = training_data_input_fn,
)
eval_spec = tf.estimator.EvalSpec(
input_fn = validation_data_input_fn,
)
tf.estimator.train_and_evaluate(
classifier,
train_spec,
eval_spec
)
Often, one uses a validation dataset to cut off training to prevent over-fitting when the loss continues to improve for the training dataset but not for the validation dataset.
Currently the tf.estimator.EvalSpec
allows one to specify after how many steps
(defaults to 100) to evaluate the model.
How can one (if possible not using tf.contrib
functions) designate to terminate training after n
number of evaluation calls (n * steps
) where the evaluation loss does not improve and then save the "best" model / checkpoint (determined by validation dataset) to a unique file name (e.g. best_validation.checkpoint
)
I understand your confusion now. The documentation for
stop_if_no_decrease_hook
states (emphasis mine):Looking through the code of the hook (version 1.11), though, you find:
What the code does is load the evaluation results (produced with your
EvalSpec
parameters) and extract the eval results and theglobal_step
(or whichever other custom step you use to count) associated with the specific evaluation record.This is the source of the
training steps
part of the docs: the early stopping is not triggered according to the number of non-improving evaluations, but to the number of non-improving evals in a certain step range (which IMHO is a bit counter-intuitive).So, to recap: Yes, the early-stopping hook uses the evaluation results to decide when it's time to cut the training, but you need to pass in the number of training steps you want to monitor and keep in mind how many evaluations will happen in that number of steps.
Examples with numbers to hopefully clarify more
Let's assume you're training indefinitely long having an evaluation every 1k steps. The specifics of how the evaluation runs is not relevant, as long as it runs every 1k steps producing a metric we want to monitor.
If you set the hook as
hook = tf.contrib.estimator.stop_if_no_decrease_hook(my_estimator, 'my_metric_to_monitor', 10000)
the hook will consider the evaluations happening in a range of 10k steps.Since you're running 1 eval every 1k steps, this boils down to early-stopping if there's a sequence of 10 consecutive evals without any improvement. If then you decide to rerun with evals every 2k steps, the hook will only consider a sequence of 5 consecutive evals without improvement.
Keeping the best model
First of all, an important note: this has nothing to do with early stopping, the issue of keeping a copy of the best model through the training and the one of stopping the training once performance start degrading are completely unrelated.
Keeping the best model can be done very easily defining a
tf.estimator.BestExporter
in yourEvalSpec
(snippet taken from the link):If you don't know how to define the
serving_input_fn
have a look hereThis allows you to keep the overall best 5 models you obtained, stored as
SavedModel
s (which is the preferred way to store models at the moment).