SQL Server DISTINCT pagination with ROW_NUMBER() n

2019-03-29 23:42发布

问题:

Ok, so basically my DISTINCT is rendered useless by using ROW_NUMBER(), I need to avoid this as it's causing duplicate results (apart from a unique num of course!)

So what I'm looking for is a query that'll work the same but not have duplicate rows because num is unique:

WITH t AS
(
   SELECT DISTINCT *, ROW_NUMBER() OVER (ORDER BY Date) AS num 
   FROM Original_Import 
   LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
   WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)
)
SELECT DISTINCT ID, num, ADName, Description_User, Description_Amex, Amount, Date
FROM t 
WHERE (t.BaseProxy = 'agmc' OR t.ADName = 'agmc') 
   AND num BETWEEN 0 AND 20
   AND (Commited IS NULL OR Commited = 0)
ORDER BY Date

It's probably fairly trivial to fix, but seeing as I'm not a SQL Server guy I'm not used to these inner queries etc..

Update: yes, num is used for paging.

回答1:

Seems like I'm two years late with my recent blog post about ROW_NUMBER() being to SELECT what DENSE_RANK() is to SELECT DISTINCT. Your CTE would have to be replaced by this:

WITH t AS
(
   SELECT DISTINCT *, DENSE_RANK() OVER (ORDER BY Date, ...) AS num 
   FROM Original_Import 
   LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
   WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)
)
SELECT ...

In the above query, the DENSE_RANK()'s ORDER BY clause will need to list all columns from Original_Import and from BASE_PROXY_VIEW_WITH_TARGET, to reproduce the same ordering as the DISTINCT keyword. That will assign exactly one rank per duplicate record set, such that DISTINCT will work again.

In the referenced blog post, I have also included a link to a SQLFiddle illustrating this in a more trivial example.

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (w) row_number
FROM t
WINDOW w AS (ORDER BY v)
ORDER BY v, row_number

An explanation why DISTINCT removes duplicate rows after window functions having been calculated can be found in this post here.



回答2:

limited confidence on this as I can't test or even compile but this is the general gist of what I was thinking...

WITH t AS
(
   SELECT [insert your fields here], ROW_NUMBER() OVER (ORDER BY Date) AS num 
   FROM (
     SELECT DISTINCT *[insert your fields here]
     FROM Original_Import 
     LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
     WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)) as X
)
SELECT DISTINCT ID, num, ADName, Description_User, Description_Amex, Amount, Date
FROM t 
WHERE (t.BaseProxy = 'agmc' OR t.ADName = 'agmc') 
   AND num BETWEEN 0 AND 20
   AND (Commited IS NULL OR Commited = 0)
ORDER BY Date