The following shows the distinct count is 2247
In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
...: .distinct().count()
Out[6]: 2247
But then when I loop through it, it returned way more than it should:
In [4]: a = []
...: for resource in VirtualMachineResources.objects.all().values('mach
...: ine', 'cluster').distinct():
...: if resource['cluster']:
...: a.append(resource['cluster'])
...:
...:
In [5]: len(a)
Out[5]: 96953
Given the records have cluster field being empty, when I loop through the queryset, I can see a lot of repeats of the same machine without cluster value, but I would expect only once.
for resource in VirtualMachineResources.objects.all().values('machine', 'cluster').distinct():
print(resource['machine'], resource['cluster'])
prints...
server1
server1
server1
It's a postgres database. Any ideas? Having a few more digs on google and this seems related?
UPDATE: Created a Django issue here
You have a defined
VirtualMachineResources.Meta.ordering
that confuses the ORM as the ordered by columns must appear in theSELECT
clause when usingDISTINCT
. The ordering happens to be cleared when using.count()
.Long storing short, add a
.order_by()
to drop yourMeta.ordering
on the queryset you are iterating over and you should be good to go. There's no good way to generate aDISTINCT
that excludes ordering fields on Django's ORM yet as that would require a complex subquery pushdown as detailed in #24218.By the way, please avoid Django's ticket tracker as a second tier support channel when you are not receiving a reply to your question as fast as you'd like.