I'm looking to turn a pandas cell containing a list into rows for each of those values.
So, take this:
If I'd like to unpack and stack the values in the 'nearest_neighbors" column so that each value would be a row within each 'opponent' index, how would I best go about this? Are there pandas methods that are meant for operations like this? I'm just not aware.
Thanks in advance, guys.
Use
apply(pd.Series)
andstack
, thenreset_index
andto_frame
Details
The fastest method I found so far is extending the DataFrame with
.iloc
and assigning back the flattened target column.Given the usual input (replicated a bit):
Given the following suggested alternatives:
I find that
extend_iloc()
is the fastest:Extending Oleg's
.iloc
answer to automatically flatten all list-columns:This assumes that each list-column has equal list length.
So all of these answers are good but I wanted something ^really simple^ so here's my contribution:
That's it.. just use this when you want a new series where the lists are 'exploded'. Here's an example where we do value_counts() on taco choices :)
In the code below, I first reset the index to make the row iteration easier.
I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.
I use a
lambda
function together with list iteration to create a row for each element of thenearest_neighbors
paired with the relevantname
andopponent
.Finally, I create a new DataFrame from this list (using the original column names and setting the index back to
name
andopponent
).EDIT JUNE 2017
An alternative method is as follows:
I think this a really good question, in Hive you would use
EXPLODE
, I think there is a case to be made that Pandas should include this functionality by default. I would probably explode the list column with a nested generator comprehension like this: