SQL Alchemy ORM returning a single column, how to

2019-01-07 08:42发布

问题:

I'm using SQL Alchemy's ORM and I find when I return a single column I get the results like so:

[(result,), (result_2,)] # etc...

With a set like this I find that I have to do this often:

results = [r[0] for r in results] # So that I just have a list of result values

This isn't that "bad" because my result sets are usually small, but if they weren't this could add significant overhead. The biggest thing is I feel it clutters the source, and missing this step is a pretty common error I run into.

Is there any way to avoid this extra step?

A related aside: This behaviour of the orm seems inconvenient in this case, but another case where my result set was, [(id, value)] it ends up like this:

[(result_1_id, result_1_val), (result_2_id, result_2_val)]

I then can just do:

results = dict(results) # so I have a map of id to value

This one has the advantage of making sense as a useful step after returning the results.

Is this really a problem or am I just being a nitpick and the post processing after getting the result set makes sense for both cases? I'm sure we can think of some other common post processing operations to make the result set more usable in the application code. Is there high performance and convenient solutions across the board or is post processing unavoidable, and merely required for varying application usages?

When my application can actually take advantage of the objects that are returned by SQL Alchemy's ORM it seems extremely helpful, but in cases where I can't or don't, not so much. Is this just a common problem of ORMs in general? Am I better off not using the ORM layer in cases like this?

I suppose I should show an example of the actual orm queries I'm talking about:

session.query(OrmObj.column_name).all()

or

session.query(OrmObj.id_column_name, OrmObj.value_column_name).all()

Of course, in a real query there'd normally be some filters, etc.

回答1:

Python's zip combined with the * inline expansion operator is a pretty handy solution to this:

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> zip(*results)
[('result', 'result_2', 'result_3')]

Then you only have to [0] index in once. For such a short list your comprehension is faster:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.010490894317626953
>>> timeit('result = [ result[0] for result in [("result",), ("result_2",), ("result_3",)] ]', number=10000)
0.0028390884399414062

However for longer lists zip should be faster:

>>> timeit('result = zip(*[(1,)]*100)', number=10000)
0.049577951431274414
>>> timeit('result = [ result[0] for result in [(1,)]*100 ]', number=10000)
0.11178708076477051

So it's up to you to determine which is better for your situation.



回答2:

One way to decrease the clutter in the source is to iterate like this:

results = [r for (r, ) in results]

Although this solution is one character longer than using the [] operator, I think it's easier on the eyes.

For even less clutter, remove the parenthesis. This makes it harder when reading the code, to notice that you're actually handling tuples, though:

results = [r for r, in results]


回答3:

I struggled with this too until I realized it's just like any other query:

for result in results:
     print result.column_name


回答4:

I found the following more readable, also includes the answer for the dict (in Python 2.7):

d = {id_: name for id_, name in session.query(Customer.id, Customer.name).all()}
l = [r.id for r in session.query(Customer).all()]

For the single value, borrowing from another answer:

l = [name for (name, ) in session.query(Customer.name).all()]

Compare with the built-in zip solution, adapted to the list:

l = list(zip(*session.query(Customer.id).all())[0])

which in my timeits provides only about 4% speed improvements.



回答5:

My solution looks like this ;)

def column(self):
    for column, *_ in Model.query.with_entities(Model.column).all():
        yield column

NOTE: py3 only.



回答6:

Wow, guys, why strain? There are method steeper way, faster and more elegant)

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> sum(results, tuple())
('result', 'result_2', 'result_3')

Speed:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.004222994000883773
>>> timeit('result = sum([("result",), ("result_2",), ("result_3",)], ())', number=10000)
0.0038205889868550003

But if more elements in list - use only zip. Zip more speed.