What are common sources of randomness in Machine L

2019-09-19 17:03发布

Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at?

标签： python-3.x machine-learning scikit-learn keras reproducible-research

1条回答

迷人小祖宗

2楼-- · 2019-09-19 17:59

Setting seeds

Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following:

# I've heard the order here is important
import random
random.seed(0)

import numpy as np
np.random.seed(0)

import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
                              inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

from keras import backend as K
K.set_session(sess)  # tell keras about the seeded session

# now import keras stuff

sklearn

sklearn.model_selection.train_test_split has a random_state parameter.

What to check

Am I loading the data in the same order every time?
Do I initialize the model the same way?
Do you use external data that might change?
Do you use external state that might change (e.g. datetime.now)?

0人赞添加讨论(0) 举报

What are common sources of randomness in Machine L

Setting seeds

sklearn

What to check

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间