I have a pandas dataframe like the following:
categories review_count
0 [Burgers, Fast Food, Restaurants] 137
1 [Steakhouses, Restaurants] 176
2 [Food, Coffee & Tea, American (New), Restaurants] 390
... .... ...
... .... ...
... .... ...
From this dataFrame,I would like to extract only those rows wherein the list in the 'categories' column of that row contains the category 'Restaurants'. I have so far tried:
df[[df.categories.isin('Restaurants'),review_count]]
,
as I also have other columns in the dataFrame, I specified these two columns that I want to extract. But I get the error:
TypeError: unhashable type: 'list'
I don't have much idea what this error means as I am very new to pandas. Please let me know how I can achieve my goal of extracting only those rows from the dataFrame wherein the 'categories' column for that row has the string 'Restaurants' as part of the categories_list. Any help would be much appreciated.
Thanks in advance!
Ok, so I've been trying to figure out an answer to this for quite a while now, but have come up empty (without basically writing a small recursing program to expand the list) and I think that's because, at first blush anyway, what you're trying to do isn't really that efficient (Jimmy C's comment about the lists being mutable is on point here) and isn't the way that you would do this most of the time in Pandas.
A better and (I think) faster way would be to store your nested list as column values so that you'd have:
Obviously, this would involve writing a python program to pull out your categories from their nested lists and then export that out to a DataFrame, but this one time hit (for the existing data) may be worthwhile for what you gain in using pandas to analyze the resulting dataframe.
There's a section in Wes's book Python for Data Analysis called "Computing Indicator/Dummy Variables" (around p. 330 or so) which would be a good resource for this sort of operation.
Sorry, that doesn't really answer your question, and I certainly don't know how feasible it is, but otherwise, you can try rtrwalker's solution, which looks pretty good, but it's the development branch, just FYI.
I think you may have to use a
lambda
function for this, since you can test whether a value in your columnisin
some sequence, butpandas
doesn't seem to provide a function for testing whether the sequence in your column contains some value:Output:
I think in pandas0.12 you can do things like:
docs at pandas.DataFrame.query