Flattening a shallow list in Python [duplicate]

2018-12-30 23:36发布

This question already has an answer here:

Is there a simple way to flatten a list of iterables with a list comprehension, or failing that, what would you all consider to be the best way to flatten a shallow list like this, balancing performance and readability?

I tried to flatten such a list with a nested list comprehension, like this:

[image for image in menuitem for menuitem in list_of_menuitems]

But I get in trouble of the NameError variety there, because the name 'menuitem' is not defined. After googling and looking around on Stack Overflow, I got the desired results with a reduce statement:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

But this method is fairly unreadable because I need that list(x) call there because x is a Django QuerySet object.

Conclusion:

Thanks to everyone who contributed to this question. Here is a summary of what I learned. I'm also making this a community wiki in case others want to add to or correct these observations.

My original reduce statement is redundant and is better written this way:

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

This is the correct syntax for a nested list comprehension (Brilliant summary dF!):

>>> [image for mi in list_of_menuitems for image in mi]

But neither of these methods are as efficient as using itertools.chain:

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

And as @cdleary notes, it's probably better style to avoid * operator magic by using chain.from_iterable like so:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]

23条回答
不流泪的眼
2楼-- · 2018-12-31 00:15

In Python 3.4 you will be able to do:

[*innerlist for innerlist in outer_list]
查看更多
路过你的时光
3楼-- · 2018-12-31 00:16

Here is the correct solution using list comprehensions (they're backward in the question):

>>> join = lambda it: (y for x in it for y in x)
>>> list(join([[1,2],[3,4,5],[]]))
[1, 2, 3, 4, 5]

In your case it would be

[image for menuitem in list_of_menuitems for image in menuitem.image_set.all()]

or you could use join and say

join(menuitem.image_set.all() for menuitem in list_of_menuitems)

In either case, the gotcha was the nesting of the for loops.

查看更多
几人难应
4楼-- · 2018-12-31 00:18

@S.Lott: You inspired me to write a timeit app.

I figured it would also vary based on the number of partitions (number of iterators within the container list) -- your comment didn't mention how many partitions there were of the thirty items. This plot is flattening a thousand items in every run, with varying number of partitions. The items are evenly distributed among the partitions.

Flattening Comparison

Code (Python 2.6):

#!/usr/bin/env python2.6

"""Usage: %prog item_count"""

from __future__ import print_function

import collections
import itertools
import operator
from timeit import Timer
import sys

import matplotlib.pyplot as pyplot

def itertools_flatten(iter_lst):
    return list(itertools.chain(*iter_lst))

def itertools_iterable_flatten(iter_iter):
    return list(itertools.chain.from_iterable(iter_iter))

def reduce_flatten(iter_lst):
    return reduce(operator.add, map(list, iter_lst))

def reduce_lambda_flatten(iter_lst):
    return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst]))

def comprehension_flatten(iter_lst):
    return list(item for iter_ in iter_lst for item in iter_)

METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda',
           'comprehension']

def _time_test_assert(iter_lst):
    """Make sure all methods produce an equivalent value.
    :raise AssertionError: On any non-equivalent value."""
    callables = (globals()[method + '_flatten'] for method in METHODS)
    results = [callable(iter_lst) for callable in callables]
    if not all(result == results[0] for result in results[1:]):
        raise AssertionError

def time_test(partition_count, item_count_per_partition, test_count=10000):
    """Run flatten methods on a list of :param:`partition_count` iterables.
    Normalize results over :param:`test_count` runs.
    :return: Mapping from method to (normalized) microseconds per pass.
    """
    iter_lst = [[dict()] * item_count_per_partition] * partition_count
    print('Partition count:    ', partition_count)
    print('Items per partition:', item_count_per_partition)
    _time_test_assert(iter_lst)
    test_str = 'flatten(%r)' % iter_lst
    result_by_method = {}
    for method in METHODS:
        setup_str = 'from test import %s_flatten as flatten' % method
        t = Timer(test_str, setup_str)
        per_pass = test_count * t.timeit(number=test_count) / test_count
        print('%20s: %.2f usec/pass' % (method, per_pass))
        result_by_method[method] = per_pass
    return result_by_method

if __name__ == '__main__':
    if len(sys.argv) != 2:
        raise ValueError('Need a number of items to flatten')
    item_count = int(sys.argv[1])
    partition_counts = []
    pass_times_by_method = collections.defaultdict(list)
    for partition_count in xrange(1, item_count):
        if item_count % partition_count != 0:
            continue
        items_per_partition = item_count / partition_count
        result_by_method = time_test(partition_count, items_per_partition)
        partition_counts.append(partition_count)
        for method, result in result_by_method.iteritems():
            pass_times_by_method[method].append(result)
    for method, pass_times in pass_times_by_method.iteritems():
        pyplot.plot(partition_counts, pass_times, label=method)
    pyplot.legend()
    pyplot.title('Flattening Comparison for %d Items' % item_count)
    pyplot.xlabel('Number of Partitions')
    pyplot.ylabel('Microseconds')
    pyplot.show()

Edit: Decided to make it community wiki.

Note: METHODS should probably be accumulated with a decorator, but I figure it'd be easier for people to read this way.

查看更多
素衣白纱
5楼-- · 2018-12-31 00:18

have you tried flatten? From matplotlib.cbook.flatten(seq, scalarp=) ?

l=[[1,2,3],[4,5,6], [7], [8,9]]*33

run("list(flatten(l))")
         3732 function calls (3303 primitive calls) in 0.007 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
      429    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
      429    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
      429    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
  727/298    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
      429    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
      858    0.001    0.000    0.001    0.000 {isinstance}
      429    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*66

run("list(flatten(l))")
         7461 function calls (6603 primitive calls) in 0.007 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
      858    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
      858    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
      858    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
 1453/595    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
      858    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     1716    0.001    0.000    0.001    0.000 {isinstance}
      858    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*99

run("list(flatten(l))")
         11190 function calls (9903 primitive calls) in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.010    0.010 <string>:1(<module>)
     1287    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
     1287    0.003    0.000    0.004    0.000 cbook.py:484(is_string_like)
     1287    0.002    0.000    0.009    0.000 cbook.py:565(is_scalar_or_string)
 2179/892    0.001    0.000    0.010    0.000 cbook.py:605(flatten)
     1287    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     2574    0.001    0.000    0.001    0.000 {isinstance}
     1287    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*132

run("list(flatten(l))")
         14919 function calls (13203 primitive calls) in 0.013 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.013    0.013 <string>:1(<module>)
     1716    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
     1716    0.004    0.000    0.006    0.000 cbook.py:484(is_string_like)
     1716    0.003    0.000    0.011    0.000 cbook.py:565(is_scalar_or_string)
2905/1189    0.002    0.000    0.013    0.000 cbook.py:605(flatten)
     1716    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     3432    0.001    0.000    0.001    0.000 {isinstance}
     1716    0.001    0.000    0.001    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler'

UPDATE Which gave me another idea:

l=[[1,2,3],[4,5,6], [7], [8,9]]*33

run("flattenlist(l)")
         564 function calls (432 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    133/1    0.000    0.000    0.000    0.000 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
      429    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*66

run("flattenlist(l)")
         1125 function calls (861 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    265/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
      858    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*99

run("flattenlist(l)")
         1686 function calls (1290 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    397/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
     1287    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*132

run("flattenlist(l)")
         2247 function calls (1719 primitive calls) in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    529/1    0.001    0.000    0.002    0.002 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
     1716    0.001    0.000    0.001    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*1320

run("flattenlist(l)")
         22443 function calls (17163 primitive calls) in 0.016 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   5281/1    0.011    0.000    0.016    0.016 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.016    0.016 <string>:1(<module>)
    17160    0.005    0.000    0.005    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So to test how effective it is when recursive gets deeper: How much deeper?

l=[[1,2,3],[4,5,6], [7], [8,9]]*1320

new=[l]*33

run("flattenlist(new)")
         740589 function calls (566316 primitive calls) in 0.418 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 174274/1    0.281    0.000    0.417    0.417 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.001    0.001    0.418    0.418 <string>:1(<module>)
   566313    0.136    0.000    0.136    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*66

run("flattenlist(new)")
         1481175 function calls (1132629 primitive calls) in 0.809 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 348547/1    0.542    0.000    0.807    0.807 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.002    0.002    0.809    0.809 <string>:1(<module>)
  1132626    0.266    0.000    0.266    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*99

run("flattenlist(new)")
         2221761 function calls (1698942 primitive calls) in 1.211 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 522820/1    0.815    0.000    1.208    1.208 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.002    0.002    1.211    1.211 <string>:1(<module>)
  1698939    0.393    0.000    0.393    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*132

run("flattenlist(new)")
         2962347 function calls (2265255 primitive calls) in 1.630 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 697093/1    1.091    0.000    1.627    1.627 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.003    0.003    1.630    1.630 <string>:1(<module>)
  2265252    0.536    0.000    0.536    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*1320

run("flattenlist(new)")
         29623443 function calls (22652523 primitive calls) in 16.103 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
6970921/1   10.842    0.000   16.069   16.069 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.034    0.034   16.103   16.103 <string>:1(<module>)
 22652520    5.227    0.000    5.227    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I will bet "flattenlist" I am going to use this rather than matploblib for a long long time unless I want a yield generator and fast result as "flatten" uses in matploblib.cbook

This, is fast.

  • And here is the code

:

typ=(list,tuple)


def flattenlist(d):
    thelist = []
    for x in d:
        if not isinstance(x,typ):
            thelist += [x]
        else:
            thelist += flattenlist(x)
    return thelist
查看更多
听够珍惜
6楼-- · 2018-12-31 00:21

pylab provides a flatten: link to numpy flatten

查看更多
几人难应
7楼-- · 2018-12-31 00:23

sum(list_of_lists, []) would flatten it.

l = [['image00', 'image01'], ['image10'], []]
print sum(l,[]) # prints ['image00', 'image01', 'image10']
查看更多
登录 后发表回答