I am really new to machine learning,i was going through some example on
sklearn
Can someone explain me what really "Random-state" means in below example
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)
X
list(y)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
X_train
y_train
X_test
y_test
Why its hard coded to 42?
Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.
On a serious note, random_state
simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.
Relevant documentation:
random_state
: int
, RandomState
instance or None
, optional
(default=None
)
If int
, random_state
is the seed used by the random
number generator; If RandomState
instance, random_state
is the random
number generator; If None
, the random number generator is the
RandomState
instance used by np.random
.
Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.