I am really new to machine learning,i was going through some example on sklearn
Can someone explain me what really "Random-state" means in below example
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)
X
list(y)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
X_train
y_train
X_test
y_test
Why its hard coded to 42?
Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.
Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.
On a serious note,
random_state
simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.Relevant documentation: