Find specific Row of Data from Pandas Dataframe in

2019-08-26 12:20发布

  1. I am trying to take a csv, and read it as a Pandas Dataframe.
  2. This Dataframe contains 4 rows of numbers.
  3. I want to pick a specific row of data from the Dataframe.
  4. In a While Loop, I want to select a random row from the Dataframe, and compare it to row that I picked.
  5. I want it to continue to run through the while loop until that random row, is 100% equal to the row I picked prior.
  6. Then I want the While Loop to break and I want it to have counted how many tries it took to match the random number.

Here's what I have so far:

This is an example of the Dataframe:

    A  B  C  D
1   2  7  12 14
2   4  5  11 23
3   4  6  14 20
4   4  7  13 50
5   9  6  14 35

Here is an example of my efforts:

import time
import pandas as pd

then = time.time()

count = 0

df = pd.read_csv('Get_Numbers.csv')
df.columns = ['A', 'B', 'C', 'D']

while True:
    df_elements = df.sample(n=1)
    random_row = df_elements
    print(random_row)
    find_this_row = df['A','B','C','D' == '4','7','13,'50']
    print(find_this_row)
    if find_this_row != random_row:
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")

The above code gives an obvious error... but I have tried so many different versions now of finding the find_this_row numbers that I just don't know what to do anymore, so I left this attempt in.

What I would like to try to avoid is using the specific index for the row I am trying to find, I would rather use just the values to find this.

I am using df_elements = df.sample(n=1) to select a row at random. This was to avoid using random.choice as I was not sure if that would work or which way is more time/memory efficient, but I'm open to advice on that as well.

In my mind it seems simple, randomly select a row of data, if it doesn't match the row of data that I want, keep randomly selecting rows of data until it does match. But I can't seem to execute it.

Any help is EXTREMELY Appreciated!

4条回答
做自己的国王
2楼-- · 2019-08-26 12:24

You can use values which returns np.ndarray of shape=(1, 2), use values[0] to get just 1D array.

Then compare the arrays with any()

import time
import pandas as pd

then = time.time()

df = pd.DataFrame(data={'A': [1, 2, 3],
                        'B': [8, 9, 10]})

find_this_row = [2, 9]
print("Looking for: {}".format(find_this_row))

count = 0
while True:
    random_row = df.sample(n=1).values[0]
    print(random_row)

    if any(find_this_row != random_row):
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")
查看更多
趁早两清
3楼-- · 2019-08-26 12:36

A couple of hints first. This line does not work for me:

find_this_row = df['A','B','C','D' == '4','7','13,'50']

For 2 reasons:

  • a missing " ' " after ,'13
  • df is a DataFrame(), so using keys like below is not supported

df['A','B','C','D' ...

Either use keys to return a DataFrame():

df[['A','B','C','D']]

or as a Series():

df['A']

Since you need the whole row with multiple columns do this:

df2.iloc[4].values

array(['4', '7', '13', '50'], dtype=object)

Do the same with your sample row:

df2.sample(n=1).values

Comparison between rows needs to be done for all() elements/columns:

df2.sample(n=1).values == df2.iloc[4].values

array([[ True, False, False, False]])

with adding .all() like the following:

(df2.sample(n=1).values == df2.iloc[4].values).all()

which returns

True/False

All together:

import time
import pandas as pd

then = time.time()
count = 0
while True:
    random_row = df2.sample(n=1).values
    find_this_row = df2.iloc[4].values
    if (random_row == find_this_row).all() == False:
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")
查看更多
够拽才男人
4楼-- · 2019-08-26 12:37

Here's a method that tests one row at a time. We check if the values of the chosen row are equal to the values of the sampled DataFrame. We require that they all match.

row = df.sample(1)

counter = 0
not_a_match = True

while not_a_match:
    not_a_match = ~(df.sample(n=1).values == row.values).all()
    counter+=1

print(f'It took {counter} tries and the numbers were\n{row}')
#It took 9 tries and the numbers were
#   A  B   C   D
#4  4  7  13  50

If you want to get a little bit faster, you select one row and then sample the DataFrame with replacement many times. You can then check for the first time the sampled row equals your sampled DataFrame, giving you how many 'tries' it would have taken in a while loop, but in much less time. The loop protects against the unlikely case we do not find a match, given that it's sampling with replacement.

row = df.sample(1)

n = 0
none_match = True
k = 10  # Increase to check more matches at once.

while none_match:
    matches = (df.sample(n=len(df)*k, replace=True).values == row.values).all(1)
    none_match = ~matches.any()  # Determine if none still match
    n += k*len(df)*none_match  # Only increment if none match
n = n + matches.argmax() + 1

print(f'It took {n} tries and the numbers were\n{row}')
#It took 3 tries and the numbers were
#   A  B   C   D
#4  4  7  13  50
查看更多
淡お忘
5楼-- · 2019-08-26 12:38

How about using values?

values will return you a list of values. And then you can compare two lists easily.

list1 == list2 will return an array of True and False values as it compares indexes of the corresponding lists. You can check if all of the values returned are True

查看更多
登录 后发表回答