Inner Join and Group By in SQL with out an aggrega

2019-08-09 18:58发布

问题:

I am trying to retrieve data from 2 table with some conditions. When I just do the inner join with the conditions , I get a huge value (200000 data). But when I group by I get a very less value like (8000 data).

SELECT Tcg.SK_tID, Tcg.SK_ServiceProviderID
INTO #CHDetails 
FROM #ClientGroup Tcg           
INNER JOIN dbo.Component AS chd ON  chd.SK_PID = Tcg.SK_PID 
                                AND chd.SK_ServiceProviderID = Tcg.SK_ServiceProviderID
                                AND chd.SK_CompID = @CHD
                                AND chd.ReportDate < @ReportDate
GROUP BY Tcg.SK_PID ,Tcg.SK_ServiceProviderID

Can you please let me know the cause for this. Inner join always takes the common data. The data in the #ClientGroup table is around 70000 , while data in the dbo.Component is very huge. When I query for common PID and Service provider logically it shoul give me the records equal to or less than #ClientGroup. How is it giving more ?

When I do group by i get 8000. But why should I do group by in a inner join for 2 tables.

回答1:

The group by is essentially performing a distinct on the result. The reason you have to do this is likely because you have duplicates in both tables.

See this sqlfiddle: http://sqlfiddle.com/#!3/cbdca/2

In it, table1 has 3 rows and table2 has 3 rows. When joined together, they return 9 rows.



回答2:

if a join is giving you more records than expected that means that your join criterea is not complete. The extreme case for this would be a cartesian join.

check your joining criteria.



回答3:

Group by combines rows together. Usually it is done to produce aggregate data or produce a list of unique values. In your example, the #ClientGroup table is grouped together and the total number of rows you get returned shrinks. You should be getting less rows than are in #ClientGroup.



回答4:

It will not reduce the number of rows because The INNER JOIN does not perform DISTINCT instead INNER JOIN finds all matching data from both tables.

Let’s say you have table A and table B. both has column ID. Table A has total 3 rows in it and all values are unique. values are 1,2,3. TABLE B has total 300 rows, but only 3 unique values. Values are 1,2,3. (table B has 100 rows for each unique value)

Now in this example if I do INNER JOIN between Table A and Table B then every row from A will get join Every matching row in table B. So value 1 from Table A will match 100 rows in table B value 2 from Table A will match 100 rows in table B value 3 from Table A will match 100 rows in table B

so after doing INNER join I will get total 300 rows.

On the contrary if Table B has completely different values (other than 1,2,3) then INNER JOIN would has produced no result.