Django distinct returns more records than count

2019-08-29 08:14发布

The following shows the distinct count is 2247

In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
   ...: .distinct().count()                                               
Out[6]: 2247

But then when I loop through it, it returned way more than it should:

In [4]: a = [] 
   ...: for resource in VirtualMachineResources.objects.all().values('mach
   ...: ine', 'cluster').distinct(): 
   ...:     if resource['cluster']: 
   ...:         a.append(resource['cluster']) 
   ...:          
   ...:                                                                   

In [5]: len(a)                                                            
Out[5]: 96953

Given the records have cluster field being empty, when I loop through the queryset, I can see a lot of repeats of the same machine without cluster value, but I would expect only once.

for resource in VirtualMachineResources.objects.all().values('machine', 'cluster').distinct(): 
    print(resource['machine'], resource['cluster'])

prints...

server1
server1
server1

It's a postgres database. Any ideas? Having a few more digs on google and this seems related?

UPDATE: Created a Django issue here

1条回答
来,给爷笑一个
2楼-- · 2019-08-29 08:31

You have a defined VirtualMachineResources.Meta.ordering that confuses the ORM as the ordered by columns must appear in the SELECT clause when using DISTINCT. The ordering happens to be cleared when using .count().

Long storing short, add a .order_by() to drop your Meta.ordering on the queryset you are iterating over and you should be good to go. There's no good way to generate a DISTINCT that excludes ordering fields on Django's ORM yet as that would require a complex subquery pushdown as detailed in #24218.

By the way, please avoid Django's ticket tracker as a second tier support channel when you are not receiving a reply to your question as fast as you'd like.

查看更多
登录 后发表回答