I have a pandas dataframe that I'd like to filter by a specific word (test) in a column. I tried:
df[df[col].str.contains('test')]
But it returns an empty dataframe with just the column names. For the output, I'm looking for a dataframe that'd contain all rows that contain the word 'test'. What can I do?
EDIT (to add samples):
data = pd.read_csv(/...csv)
data has 5 cols, including 'BusinessDescription'
, and I want to extract all rows that have the word 'dental' (case insensitive) in the Business Description
col, so I used:
filtered = data[data['BusinessDescription'].str.contains('dental')==True]
and I get an empty dataframe, with just the header names of the 5 cols.
It seems you need parameter
flags
incontains
:Another solution, thanks Anton vBR is convert to lowercase first:
Example:
For future programming I'd recommend using the keyword df instead of data when refering to dataframes. It is the common way around SO to use that notation.
Timings:
Caveat:
Performance really depend on the data - size of
DataFrame
and number of values matching condition.Keep the string enclosed in quotes.
Thanks
It works also OK if you add a condition