Python's list comprehension vs .NET LINQ

2019-01-16 02:55发布

问题:

The following simple LINQ code

string[] words = { "hello", "wonderful", "linq", "beautiful", "world" };

// Get only short words
var shortWords =
  from word in words
  where word.Length <= 5
  select word;

// Print each word out
shortWords.Dump();

can be translated into python using list comprehension as follows.

words = ["hello", "wonderful", "linq", "beautiful", "world"]
shortWords = [x for x in words if len(x) <=5]
print shortWords
  • Is LINQ just another idea to implement list comprehension?
  • What examples might be that LINQ can do but list comprehension can't do.

回答1:

(Warning: Mammoth answer ahead. The part up to the first horizontal line makes a good tl;dr section, I suppose)

I'm not sure if I qualify as Python guru... but I have a solid grasp on iteration in Python, so let's try :)

First off: Afaik, LINQ queries are executed lazily - if that's the case, generator expressions are a closer Python concept (either way, list-, dict- and set comprehensions are conceptually just generator expressions fed to the list/dict/set constructor!).

Also, there is a conceptual difference: LINQ is for, as the name says, querying data structures. List-/dict-/set comprehensions are possible application of this (e.g. filtering and projecting the items of a list). So they are in fact less general (as we will see, many things built into LINQ are not built into them). Likewise, generator expressions are a way to formulate an one-time forward iterator in-place (I like to think of it as lambda for generator functions, only without an ugly, long keyword ;) ) and not a way to describe a complex query. They overlap, yes, but they are not identical. If you want all the power of LINQ in Python, you will have to write a fully-fledged generator. Or combine the numerous powerful generators built-in and in itertools.


Now, Python counterparts for the LINQ capabilities Jon Skeet named:

Projections: (x.foo for ...)

Filtering: (... if x.bar > 5)

  • Joins (x join y on x.foo equals y.bar)

The closest thing would be((x_item, next(y_item for y_item in y if x_item.foo == y_item.bar)) for x_item in x), I suppose.

Note that this will not iterate over the whole y for each x_item, it will only get the first match.

  • Group joins (x join y on x.foo equals y.bar into g)

This is harder. Python doesn't have anonymous types, though they are trivial to do yourself if you don't mind messing with __dict__:

class Anonymous(object):
    def __init__(self, **kwargs):
        self.__dict__ = kwargs

Then, we could do (Anonymous(x=x, y=y) for ...) to get a list of objects that are have x and y members with the respective values. The right thing is usually feeding the results to the constructor of an approriate class, say, XY.

  • Grouping (group x.foo by x.bar)

Now it gets hairy... there is no build-in way, afaik. But we can define it ourself if we need it:

from collections import defaultdict

def group_by(iterable, group_func):
    groups = defaultdict(list)
    for item in iterable:
        groups[group_func(item)].append(item)
    return groups

Example:

>>> from operator import attrgetter
>>> group_by((x.foo for x in ...), attrgetter('bar'))
defaultdict(<class 'list'>, {some_value_of_bar: [x.foo of all x where x.bar == some_value_of_bar], some_other_value_of_bar: [...], ...})

This requires whatever we group by to be hashable, though. It's possible to avoid this, and I'll make a stab if there is public demand. But for now, I'm being lazy :)

We can also just return an iterable of groups without the values we grouped by, by calling .values() on the result (of course we can feed that to list to get something we can index and iterate several times). But who knows if we won't need the group values...

  • Ordering (orderby x.foo ascending, y.bar descending)

Sorting needs special syntax? The build-in sorted works for iterables, too: sorted(x % 2 for x in range(10)) or sorted(x for x in xs, key=attrgetter('foo')). Sorted ascending by default, the keyword argument reverse gives descending order.

Alas, afaik sorting by multiple attributes is not that easy, especially when mixing ascending and descending. Hmm... topic for a recipe?

  • Intermediate variables (let tmp = x.foo)

No, not possible in comprehensions or generator expressions - they are, as the name says, supposed to be expressions (and usually only span one or two lines). It's perfectly possible in generator function, though:

(x * 2 for x in iterable)

rewritten as generator with intermediate variable:

def doubles(iterable):
    for x in iterable:
        times2 = x * 2
        yield times2

Flattening: (c for s in ("aa","bb") for c in s )


Note that although LINQ to Objects deals with delegates, other query providers (e.g. LINQ to SQL) can deal in expression trees which describe the query instead of just presenting executable delegates. This allows the query to be translated into SQL (or other query languages) - again, I don't know whether Python supports that sort of thing or not. It's a significant part of LINQ though.

Python definitely does no such thing. List expressions correspond one-to-one to accumulating a plain list in a (possibly nested) for-loop, generator expressions correspond one-to-one to a generator. Given the parser and ast module, it would be possible in theory to write a library for converting a comprehension into e.g. an SQL query. But nobody cares to.



回答2:

Well, you need to distinguish between some different things:

  • LINQ standard query operators
  • LINQ query expressions in C#
  • LINQ query expressions in VB

C# doesn't support as much in query expressions as VB does, but here's what it does support:

  • Projections (select x.foo)
  • Filtering (where x.bar > 5)
  • Joins (x join y on x.foo equals y.bar)
  • Group joins (x join y on x.foo equals y.bar into g)
  • Grouping (group x.foo by x.bar)
  • Ordering (orderby x.foo ascending, y.bar descending)
  • Intermediate variables (let tmp = x.foo)
  • Flattening (from x in y from z in x)

I don't know how many of those are supported directly in Python's list comprehensions.

Note that although LINQ to Objects deals with delegates, other query providers (e.g. LINQ to SQL) can deal in expression trees which describe the query instead of just presenting executable delegates. This allows the query to be translated into SQL (or other query languages) - again, I don't know whether Python supports that sort of thing or not. It's a significant part of LINQ though.



回答3:

By using the asq Python package you can easily do most things in Python that you can do in C# using LINQ-for-objects. Using asq, your Python example becomes:

from asq.initiators import query
words = ["hello", "wonderful", "linq", "beautiful", "world"]
shortWords = query(words).where(lambda x: len(x) <= 5)


回答4:

I am no Python guru, but I would say that Python actually supports all of them as you can nest list comprehensions and include all the lambda expressions that you want. (list comprehensions tend to be hard to read if they get too complex, though...), but no it includes no "specific syntax" to accomplish all that.

Most of the functionality can be reproduced using : - list comprehensions or generators - lambda-functions or builtin functions (like filter() or map()) or functions from the itertools module

For instance, if you want to copy the behaviour of :

  • Projections : that would be the left part of a list-comprehension ... which can be single values but also tuples. ex : [ (k,v) for k,v in my_dict.items() if k.startswith("abc"]. You may also use map()
  • Filtering : that would be the expression on the right, after the if. You may also use filter()
  • Ordering : just use the built-in sorted()
  • Grouping or aggregates : use the builtin min(), max() or itertools.groupby()

Regarding joins or flattening, I think you would have to "to do it by hand"...

(Always good to have the Python Quick Reference at reach )