Flattening a shallow list in Python [duplicate]

2018-12-30 23:36发布

This question already has an answer here:

Is there a simple way to flatten a list of iterables with a list comprehension, or failing that, what would you all consider to be the best way to flatten a shallow list like this, balancing performance and readability?

I tried to flatten such a list with a nested list comprehension, like this:

[image for image in menuitem for menuitem in list_of_menuitems]

But I get in trouble of the NameError variety there, because the name 'menuitem' is not defined. After googling and looking around on Stack Overflow, I got the desired results with a reduce statement:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

But this method is fairly unreadable because I need that list(x) call there because x is a Django QuerySet object.

Conclusion:

Thanks to everyone who contributed to this question. Here is a summary of what I learned. I'm also making this a community wiki in case others want to add to or correct these observations.

My original reduce statement is redundant and is better written this way:

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

This is the correct syntax for a nested list comprehension (Brilliant summary dF!):

>>> [image for mi in list_of_menuitems for image in mi]

But neither of these methods are as efficient as using itertools.chain:

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

And as @cdleary notes, it's probably better style to avoid * operator magic by using chain.from_iterable like so:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]

23条回答
梦醉为红颜
2楼-- · 2018-12-31 00:10

This solution works for arbitrary nesting depths - not just the "list of lists" depth that some (all?) of the other solutions are limited to:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

It's the recursion which allows for arbitrary depth nesting - until you hit the maximum recursion depth, of course...

查看更多
只若初见
3楼-- · 2018-12-31 00:12

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you're thinking functional, it is as easy as this::

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. let's try with a list::

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it's no comparison to reduce with concat.

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop
查看更多
情到深处是孤独
4楼-- · 2018-12-31 00:12

From my experience, the most efficient way to flatten a list of lists is:

flat_list = []
map(flat_list.extend, list_of_list)

Some timeit comparisons with the other proposed methods:

list_of_list = [range(10)]*1000
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 119 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#1000 loops, best of 3: 210 µs per loop
%timeit flat_list=[i for sublist in list_of_list for i in sublist]
#1000 loops, best of 3: 525 µs per loop
%timeit flat_list=reduce(list.__add__,list_of_list)
#100 loops, best of 3: 18.1 ms per loop

Now, the efficiency gain appears better when processing longer sublists:

list_of_list = [range(1000)]*10
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 60.7 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#10000 loops, best of 3: 176 µs per loop

And this methods also works with any iterative object:

class SquaredRange(object):
    def __init__(self, n): 
        self.range = range(n)
    def __iter__(self):
        for i in self.range: 
            yield i**2

list_of_list = [SquaredRange(5)]*3
flat_list = []
map(flat_list.extend, list_of_list)
print flat_list
#[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]
查看更多
无色无味的生活
5楼-- · 2018-12-31 00:13
def flatten(items):
    for i in items:
        if hasattr(i, '__iter__'):
            for m in flatten(i):
                yield m
        else:
            yield i

Test:

print list(flatten2([1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]))
查看更多
情到深处是孤独
6楼-- · 2018-12-31 00:14

Performance Results. Revised.

import itertools
def itertools_flatten( aList ):
    return list( itertools.chain(*aList) )

from operator import add
def reduce_flatten1( aList ):
    return reduce(add, map(lambda x: list(x), [mi for mi in aList]))

def reduce_flatten2( aList ):
    return reduce(list.__add__, map(list, aList))

def comprehension_flatten( aList ):
    return list(y for x in aList for y in x)

I flattened a 2-level list of 30 items 1000 times

itertools_flatten     0.00554
comprehension_flatten 0.00815
reduce_flatten2       0.01103
reduce_flatten1       0.01404

Reduce is always a poor choice.

查看更多
栀子花@的思念
7楼-- · 2018-12-31 00:14

If you're looking for a built-in, simple, one-liner you can use:

a = [[1, 2, 3], [4, 5, 6]
b = [i[x] for i in a for x in range(len(i))]
print b

returns

[1, 2, 3, 4, 5, 6]
查看更多
登录 后发表回答