I am currently trying to find a way to randomize items in a dataframe row-wise. I found this thread on shuffling/permutation column-wise in pandas (shuffling/permutating a DataFrame in pandas), but for my purposes, is there a way to do something like
import pandas as pd
data = {'day': ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri'],
'color': ['Blue', 'Red', 'Green', 'Yellow', 'Black'],
'Number': [11, 8, 10, 15, 11]}
dataframe = pd.DataFrame(data)
Number color day
0 11 Blue Mon
1 8 Red Tues
2 10 Green Wed
3 15 Yellow Thurs
4 11 Black Fri
And randomize the rows into some like
Number color day
0 Mon Blue 11
1 Red Tues 8
2 10 Wed Green
3 15 Yellow Thurs
4 Black 11 Fri
If in order to do so, the column headers would have to go away or something of the like, I understand.
EDIT: So, in the thread I posted, part of the code refers to an "axis" parameter. I understand that axis = 0 refers to the columns and axis =1 refers to the rows. I tried taking the code and changing the axis to 1, and it seems to randomize my dataframe only if the table consists of all numbers (as opposed to a list of strings, or a combination of the two).
That said, should I consider not using dataframes? Is there a better 2D structure where I can randomize the rows and the columns if my data consists of only strings or a combinations of ints and strings?
Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)
I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :
And if you want to keep dataframe :
Here a function to shuffle rows and columns:
Hope this helps
Maybe flatten the 2d array and then shuffle?
Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287 which uses
np.apply_along_axis()
See the full answer to see how that could be integrated with a Pandas df.