Column with list of strings in python

2019-03-25 18:56发布

I have a pandas dataframe like the following:

                                          categories  review_count
0                  [Burgers, Fast Food, Restaurants]           137
1                         [Steakhouses, Restaurants]           176
2  [Food, Coffee & Tea, American (New), Restaurants]           390
...                                          ....              ...
...                                          ....              ...
...                                          ....              ...

From this dataFrame,I would like to extract only those rows wherein the list in the 'categories' column of that row contains the category 'Restaurants'. I have so far tried: df[[df.categories.isin('Restaurants'),review_count]],

as I also have other columns in the dataFrame, I specified these two columns that I want to extract. But I get the error:

TypeError: unhashable type: 'list'

I don't have much idea what this error means as I am very new to pandas. Please let me know how I can achieve my goal of extracting only those rows from the dataFrame wherein the 'categories' column for that row has the string 'Restaurants' as part of the categories_list. Any help would be much appreciated.

Thanks in advance!

3条回答
太酷不给撩
2楼-- · 2019-03-25 19:36

Ok, so I've been trying to figure out an answer to this for quite a while now, but have come up empty (without basically writing a small recursing program to expand the list) and I think that's because, at first blush anyway, what you're trying to do isn't really that efficient (Jimmy C's comment about the lists being mutable is on point here) and isn't the way that you would do this most of the time in Pandas.

A better and (I think) faster way would be to store your nested list as column values so that you'd have:

df
    review_count    Burgers   Fast Food   Restaurants    Steakhouses  Food    CoffeeTea  American (New)
0            137    True      True        True           False        False   False      False
1            176    False     False       True           True         False   False      False
2            390    False     False       True           False        True    True       True   

Obviously, this would involve writing a python program to pull out your categories from their nested lists and then export that out to a DataFrame, but this one time hit (for the existing data) may be worthwhile for what you gain in using pandas to analyze the resulting dataframe.

There's a section in Wes's book Python for Data Analysis called "Computing Indicator/Dummy Variables" (around p. 330 or so) which would be a good resource for this sort of operation.

Sorry, that doesn't really answer your question, and I certainly don't know how feasible it is, but otherwise, you can try rtrwalker's solution, which looks pretty good, but it's the development branch, just FYI.

查看更多
疯言疯语
3楼-- · 2019-03-25 19:39

I think you may have to use a lambda function for this, since you can test whether a value in your column isin some sequence, but pandas doesn't seem to provide a function for testing whether the sequence in your column contains some value:

import pandas as pd
categories = [['fast_food', 'restaurant'], ['coffee', 'cafe'], ['burger', 'restaurant']]
counts = [137, 176, 390]
df = pd.DataFrame({'categories': categories, 'review_count': counts})
# Show which rows contain 'restaurant'
df.categories.map(lambda x: 'restaurant' in x)
# Subset the dataframe using this:
df[df.categories.map(lambda x: 'restaurant' in x)]

Output:

Out[11]: 
                categories  review_count
0  [fast_food, restaurant]           137
2     [burger, restaurant]           390
查看更多
叼着烟拽天下
4楼-- · 2019-03-25 19:46

I think in pandas0.12 you can do things like:

df.query('"Restaurants" in categories')

docs at pandas.DataFrame.query

查看更多
登录 后发表回答