I am working with the Open Food Facts dataset which is very messy. There is a column called quantity in which in information about the quantity of respective food. the entries look like:
365 g (314 ml)
992 g
2.46 kg
0,33 litre
15.87oz
250 ml
1 L
33 cl
... and so on (very messy!!!)
I want to create a new column called is_liquid
.
My idea is that if the quantity string contains an l
or L
the is_liquid field in this row should get a 1 and if not 0.
Here is what I've tried:
I wrote this function:
def is_liquid(x):
if x.str.contains('l'):
return 1
elif x.str.contains('L'):
return 1
else: return 0
(BTW: if something is measured in 'oz' is it liquid?)
And then tried to apply it
df['is_liquid'] = df['quantity'].apply(is_liquid)
But all I get is this error:
AttributeError: 'str' object has no attribute 'str'
Could someone help me out?
Use
str.contains
withcase=False
for boolean mask and convert it tointeger
s bySeries.astype
: