This is my situation -
In[1]: data
Out[1]:
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
2 Tomato Edible, Vegetable
3 Laptop Non Edible, Electronic
In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame
What I want to do is create a data frame of only Fruits
, so I need to groupby
such a way that Fruit
exists in Type
.
I've tried doing this:
grouped = data.groupby(lambda x: "Fruit" in x, axis=1)
I don't know if that's the way of doing it, I'm having a little tough time understanding groupby
. How do I get a new DataFrame
of only Fruits
?
You could use
data[data['Type'].str.contains('Fruit')]
import pandas as pd
data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])
yields
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
groupby
does something else entirely. It creates groups for aggregation. Basically, it goes from something like:
['a', 'b', 'a', 'c', 'b', 'b']
to something like:
[['a', 'a'], ['b', 'b', 'b'], ['c']]
What you want is df.apply
.
In newer versions of pandas
there's a query
method that makes this a bit more efficient and easier.
However, one what of doing what you want is to make a boolean array by using
mask = df.Type.apply(lambda x: 'Fruit' in x)
And then selecting the relevant portions of the data frame with df[mask]
. Or, as a one-liner:
df[df.Type.apply(lambda x: 'Fruit' in x)]
As a full example:
import pandas as pd
data = [['Orange', 'Edible, Fruit'],
['Banana', 'Edible, Fruit'],
['Tomato', 'Edible, Vegtable'],
['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])
print df[df.Type.apply(lambda x: 'Fruit' in x)]