- I am trying to take a csv, and read it as a Pandas Dataframe.
- This Dataframe contains 4 rows of numbers.
- I want to pick a specific row of data from the Dataframe.
- In a While Loop, I want to select a random row from the Dataframe, and compare it to row that I picked.
- I want it to continue to run through the while loop until that random row, is 100% equal to the row I picked prior.
- Then I want the While Loop to break and I want it to have counted how many tries it took to match the random number.
Here's what I have so far:
This is an example of the Dataframe:
A B C D
1 2 7 12 14
2 4 5 11 23
3 4 6 14 20
4 4 7 13 50
5 9 6 14 35
Here is an example of my efforts:
import time
import pandas as pd
then = time.time()
count = 0
df = pd.read_csv('Get_Numbers.csv')
df.columns = ['A', 'B', 'C', 'D']
while True:
df_elements = df.sample(n=1)
random_row = df_elements
print(random_row)
find_this_row = df['A','B','C','D' == '4','7','13,'50']
print(find_this_row)
if find_this_row != random_row:
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
The above code gives an obvious error... but I have tried so many different versions now of finding the find_this_row
numbers that I just don't know what to do anymore, so I left this attempt in.
What I would like to try to avoid is using the specific index for the row I am trying to find, I would rather use just the values to find this.
I am using df_elements = df.sample(n=1)
to select a row at random. This was to avoid using random.choice
as I was not sure if that would work or which way is more time/memory efficient, but I'm open to advice on that as well.
In my mind it seems simple, randomly select a row of data, if it doesn't match the row of data that I want, keep randomly selecting rows of data until it does match. But I can't seem to execute it.
Any help is EXTREMELY Appreciated!
You can use values which returns
np.ndarray
ofshape=(1, 2)
, usevalues[0]
to get just 1D array.Then compare the arrays with
any()
A couple of hints first. This line does not work for me:
For 2 reasons:
Either use keys to return a DataFrame():
or as a Series():
Since you need the whole row with multiple columns do this:
Do the same with your sample row:
Comparison between rows needs to be done for all() elements/columns:
with adding .all() like the following:
which returns
All together:
Here's a method that tests one row at a time. We check if the
values
of the chosen row are equal to the values of the sampledDataFrame
. We require that theyall
match.If you want to get a little bit faster, you select one row and then sample the
DataFrame
with replacement many times. You can then check for the first time the sampled row equals your sampledDataFrame
, giving you how many 'tries' it would have taken in a while loop, but in much less time. The loop protects against the unlikely case we do not find a match, given that it's sampling with replacement.How about using
values
?values
will return you a list of values. And then you can compare two lists easily.list1 == list2
will return an array ofTrue
andFalse
values as it compares indexes of the corresponding lists. You can check if all of the values returned areTrue