Column with list of strings in python

I have a pandas dataframe like the following:

                                          categories  review_count
0                  [Burgers, Fast Food, Restaurants]           137
1                         [Steakhouses, Restaurants]           176
2  [Food, Coffee & Tea, American (New), Restaurants]           390
...                                          ....              ...
...                                          ....              ...
...                                          ....              ...

From this dataFrame,I would like to extract only those rows wherein the list in the 'categories' column of that row contains the category 'Restaurants'. I have so far tried: df[[df.categories.isin('Restaurants'),review_count]],

as I also have other columns in the dataFrame, I specified these two columns that I want to extract. But I get the error:

TypeError: unhashable type: 'list'

I don't have much idea what this error means as I am very new to pandas. Please let me know how I can achieve my goal of extracting only those rows from the dataFrame wherein the 'categories' column for that row has the string 'Restaurants' as part of the categories_list. Any help would be much appreciated.

Thanks in advance!

标签： python pandas slice dataframe

3条回答

太酷不给撩

2楼-- · 2019-03-25 19:36

Ok, so I've been trying to figure out an answer to this for quite a while now, but have come up empty (without basically writing a small recursing program to expand the list) and I think that's because, at first blush anyway, what you're trying to do isn't really that efficient (Jimmy C's comment about the lists being mutable is on point here) and isn't the way that you would do this most of the time in Pandas.

A better and (I think) faster way would be to store your nested list as column values so that you'd have:

df
    review_count    Burgers   Fast Food   Restaurants    Steakhouses  Food    CoffeeTea  American (New)
0            137    True      True        True           False        False   False      False
1            176    False     False       True           True         False   False      False
2            390    False     False       True           False        True    True       True

Obviously, this would involve writing a python program to pull out your categories from their nested lists and then export that out to a DataFrame, but this one time hit (for the existing data) may be worthwhile for what you gain in using pandas to analyze the resulting dataframe.

There's a section in Wes's book Python for Data Analysis called "Computing Indicator/Dummy Variables" (around p. 330 or so) which would be a good resource for this sort of operation.

Sorry, that doesn't really answer your question, and I certainly don't know how feasible it is, but otherwise, you can try rtrwalker's solution, which looks pretty good, but it's the development branch, just FYI.

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2019-03-25 19:39

I think you may have to use a lambda function for this, since you can test whether a value in your column isin some sequence, but pandas doesn't seem to provide a function for testing whether the sequence in your column contains some value:

import pandas as pd
categories = [['fast_food', 'restaurant'], ['coffee', 'cafe'], ['burger', 'restaurant']]
counts = [137, 176, 390]
df = pd.DataFrame({'categories': categories, 'review_count': counts})
# Show which rows contain 'restaurant'
df.categories.map(lambda x: 'restaurant' in x)
# Subset the dataframe using this:
df[df.categories.map(lambda x: 'restaurant' in x)]

Output:

Out[11]: 
                categories  review_count
0  [fast_food, restaurant]           137
2     [burger, restaurant]           390

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-03-25 19:46

I think in pandas0.12 you can do things like:

df.query('"Restaurants" in categories')

docs at pandas.DataFrame.query

0人赞添加讨论(0) 举报

Column with list of strings in python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间