Python: Nested List Modification

2019-07-31 09:01发布

I have a nested list of paired data in the format:

mylist = [['item1', 'some other stuff', 'value1'],['item1', 'some other stuff', 'value2'],['item2', 'some other stuff', 'value3'],['item2', 'some other stuff', 'value4']]

I have no idea how to do the following, but I need to:

I need the list to be grouped as such:

[['item1', 'value1', 'value2'], ['item2', 'value3', 'value4']]

So for my list of items, all of the values should be grouped with their corresponding item if the item is repeated multiple times in the list with different values.

Any help would be greatly appreciated.

Thanks

2条回答
闹够了就滚
2楼-- · 2019-07-31 09:35

Let's start out by using a dictionary, to map items to lists of values. That's going to be a lot easier (and faster) than a list, because to figure out which list to add the new value to is just mydict[item] instead of having to write some kind of linear-search function.

mydict = {}
for item, otherstuff, value in mylist:
    mydict.setdefault(item, []).append(value)

This gives you:

{'item1': ['value1', 'value2'], 'item2': ['value3', 'value4']}

Now we can convert that dictionary back to a list, if you want:

groupedlist = [[k] + v for k, v in mydict.items()]

This gives you:

[['item2', 'value3', 'value4'], ['item1', 'value1', 'value2']]

The big downside here is that once you stick things into a dict, you lose any original order. If you were expecting item1 to come first because its first entry came before item2's first entry (or because item2's last entry came after item1's maybe?), you've lost that. If it's important, you can use an OrderedDict.

The big upside is that often, you actually want a dictionary in the end, not a list.

The smaller upside is that, if your data aren't sorted, groupby(…sorted(…)) requires an O(NlogN) sort, while this solution is O(N). Usually, that won't make a difference. And if it does, the constant-factor differences for a given Python implementation and platform might outweigh the differences anyway. But if the performance matters, test both solutions and use the faster one.

查看更多
仙女界的扛把子
3楼-- · 2019-07-31 09:42

You could use itertools.groupby, if the list is not sorted as per the first item then you may have to sort it first. Means this will result in O(NlogN) complexity for unsorted data and O(N) for sorted data.

>>> from itertools import groupby
>>> [[k]+[x[-1] for x in v] for k,v in groupby(mylist,key=lambda x:x[0])]
[['item1', 'value1', 'value2'], ['item2', 'value3', 'value4']]

Use defaultdict, it'll work for both sorted and unsorted data in O(N) complexity.

>>> from collections import defaultdict
>>> dic=defaultdict(list)
>>> for x in mylist:
...     key=x[0]
...     dic[key].append(x[-1])
...     
>>> [[k]+v for k,v in dic.items()]
[['item2', 'value3', 'value4'], ['item1', 'value1', 'value2']]
查看更多
登录 后发表回答