可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.30, random_state=0)

I have seen situations like this where random state is set to one!

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.30, random_state=1)

What is the consequence of this random state in cross validation as well?

回答1:

It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42 used in many official examples of scikit as well as elsewhere also.

random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

If random_state is an integer, then it is used to seed a new RandomState object.

If random_state is a RandomState object, then it is passed through.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

回答2:

The random_state splits a randomly selected data but with a twist. And the twist is the order of the data will be same for a particular value of random_state.You need to understand that it's not a bool accpeted value. starting from 0 to any integer no, if you pass as random_state,it'll be a permanent order for it. Ex: the order you will get in random_state=0 remain same. After that if you execuit random_state=5 and again come back to random_state=0 you'll get the same order. And like 0 for all integer will go same. How ever random_state=None splits randomly each time.

If still having doubt watch this

回答3:

If you don't mention the random_state in the code, then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time.

However, if you use a particular value for random_state(random_state = 1 or any other value) everytime the result will be same,i.e, same values in train and test datasets.