How can I join a table to itself in T-SQL?

2019-04-16 04:04发布

问题:

I have the following query:

SELECT [ApplicationTitle] AS "ApplicationTitle", 
SUM([Visits]) AS "Visits",
SUM([NewVisits]) AS "NewVisits"
FROM [HeadlineFigures] 
WHERE [DataDate] >= '01/06/2009'
AND [DataDate] < '08/06/2009'
GROUP BY [ApplicationTitle]
ORDER BY [ApplicationTitle]

This returns data for 7 days. I want the same again, in the same query for a different week so it should look more like:

SELECT a.[ApplicationTitle] AS "ApplicationTitle", 
SUM(a.[Visits]) AS "Visits",
SUM(b.[Visits]) AS "PVisits",
SUM(a.[NewVisits]) AS "NewVisits"
SUM(b.[NewVisits]) AS "PNewVisits"
FROM [HeadlineFigures] a JOIN [HeadlineFigures] b
ON a.[ApplicationTitle] = b.[ApplicationTitle]
WHERE a.[DataDate] >= '01/06/2009'
AND a.[DataDate] < '08/06/2009'
AND b.[DataDate] >= '01/05/2009'
AND b.[DataDate] < '08/05/2009'
GROUP BY a.[ApplicationTitle]
ORDER BY a.[ApplicationTitle]

The above query is not correct though. What am I doing wrong?

回答1:

Your joining then aggregating, resulting in wrong aggregates. Ie. if for a ApplicationTitle you have 2 records in first set and 2 in the second, the join will have 4 records so the SUM doubles. You need to aggregate first, then join:

SELECT a.[ApplicationTitle],
    a.Visits,
    a.NewVisists,
    b.Visits as PVisits,
    b.NewVisits as PNewVisits
    FROM (
    SELECT [ApplicationTitle] AS "ApplicationTitle", 
        SUM([Visits]) AS "Visits",
        SUM([NewVisits]) AS "NewVisits"
        FROM [HeadlineFigures] 
        WHERE [DataDate] >= '01/06/2009'
        AND [DataDate] < '08/06/2009'
        GROUP BY [ApplicationTitle]
    ) AS a JOIN (
    SELECT [ApplicationTitle] AS "ApplicationTitle", 
        SUM([Visits]) AS "Visits",
        SUM([NewVisits]) AS "NewVisits"
        FROM [HeadlineFigures] 
        WHERE [DataDate] >= '01/05/2009'
        AND [DataDate] < '08/05/2009'
        GROUP BY [ApplicationTitle]
    ) AS b ON a.[ApplicationTitle] = b.[ApplicationTitle]
        ORDER BY [ApplicationTitle]


回答2:

If I'm not missing something entirely, you are making things too complicated. Try this instead:

SELECT [ApplicationTitle] AS "ApplicationTitle",
       SUM(CASE WHEN [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009' THEN [Visits] ELSE 0 END) AS "Visits",
       SUM(CASE WHEN [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009' THEN  [Visits] ELSE 0 END) AS "PVisits",
       SUM(CASE WHEN [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009' THEN [NewVisits] ELSE 0 END) AS "NewVisits",
       SUM(CASE WHEN [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009' THEN [NewVisits] ELSE 0 END) AS "PNewVisits",
  FROM [HeadlineFigures]
 WHERE    ([DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009')
       OR ([DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009')
 GROUP BY [ApplicationTitle]
 ORDER BY [ApplicationTitle]

Edit: Oops, I misread the data range. I hope SQL server will do a good job using an index over the DataDate, otherwise the from clause would need to look something like:

FROM (SELECT * FROM [ApplicationTitle] WHERE [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009'
      UNION
      SELECT * FROM [ApplicationTitle] WHERE [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009') x


标签: tsql