I have a nested list of paired data in the format:
mylist = [['item1', 'some other stuff', 'value1'],['item1', 'some other stuff', 'value2'],['item2', 'some other stuff', 'value3'],['item2', 'some other stuff', 'value4']]
I have no idea how to do the following, but I need to:
I need the list to be grouped as such:
[['item1', 'value1', 'value2'], ['item2', 'value3', 'value4']]
So for my list of items, all of the values should be grouped with their corresponding item if the item is repeated multiple times in the list with different values.
Any help would be greatly appreciated.
Thanks
Let's start out by using a dictionary, to map items to lists of values. That's going to be a lot easier (and faster) than a list, because to figure out which list to add the new value to is just
mydict[item]
instead of having to write some kind of linear-search function.This gives you:
Now we can convert that dictionary back to a list, if you want:
This gives you:
The big downside here is that once you stick things into a dict, you lose any original order. If you were expecting
item1
to come first because its first entry came beforeitem2
's first entry (or becauseitem2
's last entry came afteritem1
's maybe?), you've lost that. If it's important, you can use anOrderedDict
.The big upside is that often, you actually want a dictionary in the end, not a list.
The smaller upside is that, if your data aren't sorted,
groupby(…sorted(…))
requires an O(NlogN) sort, while this solution is O(N). Usually, that won't make a difference. And if it does, the constant-factor differences for a given Python implementation and platform might outweigh the differences anyway. But if the performance matters, test both solutions and use the faster one.You could use
itertools.groupby
, if the list is not sorted as per the first item then you may have to sort it first. Means this will result inO(NlogN)
complexity for unsorted data andO(N)
for sorted data.Use
defaultdict
, it'll work for both sorted and unsorted data inO(N)
complexity.