Distinct rows from three tables using joins

2019-08-16 05:13发布

问题:

I have three tables related to article section of my website. I need to show the top authors based on based on number if times authors articles where read. I use basic three table to store this inform.

Article has all the details related to articles, author information is stored in Authors and when a user views a particular article I update or insert a new record in Popularity.

Below is sample data:

Articles

ArticleID  Title             Desc  AuthorID
---------  ----------------  ----  --------
1          Article One       ....  100
2          Article Two       ....  200
3          Article Three     ....  100
4          Article Four      ....  300
5          Article Five      ....  100
6          Article Six       ....  300
7          Article Seven     ....  500
8          Article Eight     ....  100
9          Article Nine      ....  600

Authors

AuthorID  AuthorName
--------  ------------
100       Author One
200       Author Two
300       Author Three
400       Author Four
500       Author Five
600       Author Six

Popularity

ID  ArticleID  Hits
--  ---------  ----
1   1          20
2   2          50
3   5          100
4   3          11
5   4          21

I am trying to use following query to get the TOP 10 authors:

SELECT TOP 10    AuthorID 
      ,au.AuthorName
      ,ArticleHits
      ,SUM(ArticleHits) 
FROM Authors au 
JOIN Articles ar
  ON au.AuthorID = ar.ArticleAuthorID
JOIN  Popularity ap
  ON ap.ArticleID = ar.ArticleID
GROUP BY AuthorID,1,1,1

But this generates the following error:

Msg 164, Level 15, State 1, Line 12
Each GROUP BY expression must contain at least one column that is not an outer reference.

回答1:

SQL Server requires that any columns in the SELECT list must be in the GROUP BY cluase or in an aggregate function. The following query appears to be working, as you can see I included a GROUP BY au.AuthorID, au.AuthorName which contains both columns in the SELECT list that are not in an aggregate function:

SELECT top 10 au.AuthorID 
      ,au.AuthorName
      ,SUM(Hits) TotalHits
FROM Authors au 
JOIN Articles ar
  ON au.AuthorID = ar.AuthorID
JOIN  Popularity ap
  ON  ap.ArticleID = ar.ArticleID
GROUP BY au.AuthorID, au.AuthorName
order by TotalHits desc

See SQL Fiddle with Demo.

I am not sure if you want the Hits in the SELECT statement because you will then have to GROUP BY it. This could alter the Sum(Hits) for each article because if the hits are different in each entry you will not get an accurate sum.



回答2:

I would do it this way. First figure out who your top ten authors are, then go get the name (and any other columns you want to pull along). For this query it's not a huge difference but all that grouping can become more complex and expensive as your output list requirements increase.

;WITH TopAuthors(AuthorID, ArticleHits) AS
(
  SELECT TOP (10) a.AuthorID, SUM(p.Hits)
    FROM dbo.Authors AS a
    INNER JOIN dbo.Articles AS ar
    ON a.AuthorID = ar.AuthorID
    INNER JOIN dbo.Popularity AS p
    ON ar.ArticleID = p.ArticleID
    ORDER BY SUM(p.Hits) DESC
)
SELECT t.AuthorID, a.AuthorName, t.ArticleHits
FROM TopAuthors AS t
INNER JOIN dbo.Authors AS a
ON t.AuthorID = a.AuthorID
ORDER BY t.ArticleHits DESC;

For this specific query bluefeet's version is likely to be more efficient. But if you add additional columns to the output (e.g. more info from the authors table) the grouping might outweigh the additional seek or scan I have presented.



回答3:

As many columns present with Aggregate function those have to be present in the group by clause. In your case, AuthorID, au.AuthorName, ArticleHits should also be present. Hence the group by statement would become
GROUP BY AuthorID, au.AuthorName, ARticleHits
This would help.