Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Setting seeds
Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following:
# I've heard the order here is important
import random
random.seed(0)
import numpy as np
np.random.seed(0)
import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
from keras import backend as K
K.set_session(sess) # tell keras about the seeded session
# now import keras stuff
See also: Keras FAQ: How can I obtain reproducible results using Keras during development?
sklearn
sklearn.model_selection.train_test_split has a random_state
parameter.
What to check
- Am I loading the data in the same order every time?
- Do I initialize the model the same way?
- Do you use external data that might change?
- Do you use external state that might change (e.g.
datetime.now
)?