Let's say I have a DataFrame like this:
df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple',
'monkey eats banana', 'badger eats banana'],
'food':['apple', 'apple', 'banana', 'banana'],
'creature':['squirrel', 'badger', 'monkey', 'elephant']})
consumption creature food
0 squirrel eats apple squirrel apple
1 monkey eats apple badger apple
2 monkey eats banana monkey banana
3 badger eats banana elephant banana
I want to find rows where the 'creature' & 'food' occur in combination in the 'consumption' column i.e. if apple and squirrel occure together, then True but if Apple occur with Elephant it's False. Similarly, if Monkey & Banana occur together, then True, but Monkey-Apple would be false.
The approach I was trying was something like :
creature_list = list(df['creature'])
creature_list = '|'.join(map(str, creature_list))
food_list = list(df['food'])
food_list = '|'.join(map(str, food_list))
np.where((df['consumption'].str.contains('('+creature_list+')', case = False))
& (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0)
But this doesn't work since I get True in all instances.
How can I check for string pairs ?
Is checking for string equality too simple? You can test if the string
<creature> eats <food>
equals the respective value in theconsumption
column:I'm sure there is a better way to do this. But this is one way.
Here's one possible way: