I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.
If I want 50k random numbers I'd use:
df1['randNumCol'] = random.sample(xrange(50000), len(df1))
but for this I'm not sure how to do it.
Side note in R, I'd do:
sample(1:5, 50000, replace = TRUE)
Any suggestions?
One solution is to use
np.random.randint
:In order to make the results reproducible you can set the seed with
np.random.seed(42)
.To add a column of random integers, use
randint(low, high, size)
. There's no need to waste memory allocatingrange(low, high)
; that could be a lot of memory ifhigh
is large.(Note also that when we're just adding a single column,
size
is just an integer. In general if we want to generate an array/dataframe ofrandint()s
, size can be a tuple, as in Pandas: How to create a data frame of random integers?)