I'm running into a problem that's driving me nuts. When running the query below, I get a count of 233,769
SELECT COUNT(distinct Member_List_Link.UserID)
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And
Member_List_Link.GroupID = 5 AND
MasterMembers.ValidUsers = 1 AND
Member_List_Link.Status = 1
But if I run the same query without the distinct keyword, I get a count of 233,748
SELECT COUNT(Member_List_Link.UserID)
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5
AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1
To test, I recreated all the tables and place them into temp tables and ran the queries again:
SELECT COUNT(distinct #Temp_Member_List_Link.UserID)
FROM #Temp_Member_List_Link with (nolock)
INNER JOIN #Temp_MasterMembers with (nolock)
ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID
WHERE #Temp_MasterMembers.Active = 1 And
#Temp_Member_List_Link.GroupID = 5 AND
#Temp_MasterMembers.ValidUsers = 1 AND
#Temp_Member_List_Link.Status = 1
And without the distinct keyword
SELECT COUNT(#Temp_Member_List_Link.UserID)
FROM #Temp_Member_List_Link with (nolock)
INNER JOIN #Temp_MasterMembers with (nolock)
ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID
WHERE #Temp_MasterMembers.Active = 1 And
#Temp_Member_List_Link.GroupID = 5 AND
#Temp_MasterMembers.ValidUsers = 1 AND
#Temp_Member_List_Link.Status = 1
On a side note, I recreated the temp tables by simply running (select * from Member_List_Link into #temp...
)
And now when I check to see the difference between COUNT(column) vs. COUNT(distinct column) with these temp tables, I don't see any!
So why is there a discrepancy with the original tables?
I'm running SQL Server 2008 (Dev Edition).
UPDATE - Including statistics profile
PhysicalOp column only for the first query (without distinct)
NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek
PhysicalOp column only for the first query (with distinct)
NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan
Rows and Executes for the 1st query (without distinct)
1 1
0 0
1 1
1 1
Rows and Executes for the 2nd query (with distinct)
Rows Executes
1 1
0 0
1 1
16 1
16 16
233767 16
233767 16
281901 16
281901 16
281901 16
234787 16
234787 16
Adding OPTION(MAXDOP 1) to the 2nd query (with distinct)
Rows Executes
1 1
0 0
1 1
233767 1
233767 1
281901 1
548396 1
And the resulting PhysicalOp
NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan