Pandas - Groupby and create new DataFrame?

2019-06-19 23:01发布

问题:

This is my situation -

In[1]: data
Out[1]: 
     Item                    Type
0  Orange           Edible, Fruit
1  Banana           Edible, Fruit
2  Tomato       Edible, Vegetable
3  Laptop  Non Edible, Electronic

In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame

What I want to do is create a data frame of only Fruits, so I need to groupby such a way that Fruit exists in Type.

I've tried doing this:

grouped = data.groupby(lambda x: "Fruit" in x, axis=1)

I don't know if that's the way of doing it, I'm having a little tough time understanding groupby. How do I get a new DataFrame of only Fruits?

回答1:

You could use

data[data['Type'].str.contains('Fruit')]

import pandas as pd

data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
                     'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])

yields

     Item           Type
0  Orange  Edible, Fruit
1  Banana  Edible, Fruit


回答2:

groupby does something else entirely. It creates groups for aggregation. Basically, it goes from something like:

['a', 'b', 'a', 'c', 'b', 'b']

to something like:

[['a', 'a'], ['b', 'b', 'b'], ['c']]

What you want is df.apply.

In newer versions of pandas there's a query method that makes this a bit more efficient and easier.

However, one what of doing what you want is to make a boolean array by using

mask = df.Type.apply(lambda x: 'Fruit' in x)

And then selecting the relevant portions of the data frame with df[mask]. Or, as a one-liner:

df[df.Type.apply(lambda x: 'Fruit' in x)]

As a full example:

import pandas as pd
data = [['Orange', 'Edible, Fruit'],
        ['Banana', 'Edible, Fruit'],
        ['Tomato', 'Edible, Vegtable'],
        ['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])

print df[df.Type.apply(lambda x: 'Fruit' in x)]