I have a list:
things = ['A1','B2','C3']
I have a pandas data frame with a column containing values separated by a semicolon - some of the rows will contain matches with one of the items in the list above (it won't be a perfect match since it has other parts of a string in the column.. for example, a row in that column may have 'Wow;Here;This=A1;10001;0')
I want to save the rows that contain a match with items from the list, and then create a new data frame with those selected rows (should have the same headers). This is what I tried:
import re
for_new_df =[]
for x in df['COLUMN']:
for mp in things:
if df[df['COLUMN'].str.contains(mp)]:
for_new_df.append(mp) #This won't save the whole row - help here too, please.
This code gave me an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I'm very new to coding, so the more explanation and detail in your answer, the better! Thanks in advance.
Pandas is actually amazing but I don't find it very easy to use. However it does have many functions designed to make life easy, including tools for searching through huge data frames.
Though it may not be a full solution to your problem, this may help set you off on the right foot. I have assumed that you know which column you are searching in, column A in my example.
The output:
You can avoid the loop by joining your list of words to create a regex and use
str.contains
:should just work
So the regex pattern becomes:
'A1|B2|C3'
and this will match anywhere in your strings that contain any of these stringsExample:
As to why it failed:
this line:
returns a df masked by the boolean array of your inner
str.contains
,if
doesn't understand how to evaluate an array of booleans hence the error. If you think about it what should it do if you 1 True or all but one True? it expects a scalar and not an array like value.