Eliminate partial duplicate rows from result set

2019-07-18 03:26发布

问题:

I have a query that returns a result set similar to the one below (in reality it is far bigger, thousands of rows):

     A    | B  | C  |    D
     -----|----|----|-----
1    NULL | d0 | d0 | NULL
2    NULL | d0 | d1 | NULL
3    NULL | d0 | d2 |   a0
4      d0 | d1 | d1 | NULL
5      d0 | d2 | d2 |   a0

Two of the rows are considered duplicates, 1 and 2, because A, B and D are the same. To eliminate this, I could use SELECT DISTINCT A, B, D but then I do not get column C in my result set. Column C is necessary information for rows 3, 4 and 5.

So how do I come from the result set above to this one (the result appearing in C4 can also be NULL instead of d1):

     A    | B  | C    | D
     -----|----|------|-----
1    NULL | d0 | NULL | NULL
3    NULL | d0 | d2   |   a0
4      d0 | d1 | d1   | NULL
5      d0 | d2 | d2   |   a0

回答1:

DECLARE @YourTable TABLE (
  A VARCHAR(2)
  , B VARCHAR(2)
  , C VARCHAR(2)
  , D VARCHAR(2))

INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL)
INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL)
INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0')
INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL)
INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0')


SELECT A, B, C = MIN(C), D
FROM @YourTable
GROUP BY A, B, D

SELECT A, B, CASE WHEN MIN(C) = MAX(C) THEN MIN(C) ELSE NULL END, D
FROM @YourTable
GROUP BY A, B, D

SELECT A, B, CASE WHEN MIN(COALESCE(C, 'dx')) = MAX(COALESCE(C, 'dx')) THEN MIN(C) ELSE NULL END, D
FROM @YourTable
GROUP BY A, B, D


回答2:

Use Dense_Rank() to partition by A, B, and D
(Thanks Lieven, for the temp table query, I had to use it for demo to be consistent ;))

According to MSDN,

The rank of a row is one plus the number of distinct ranks that come before the row in question

Partitioning by A, B, C and then sorting by A, B, C, D will give you the rank of 1 for the first distinct value where uniqueness is defined by A, B, D. That is where filtering by 1 came from.

where DenseRank = 1

Here is the result

Here is the code:

DECLARE @YourTable TABLE (
  A VARCHAR(2)
  , B VARCHAR(2)
  , C VARCHAR(2)
  , D VARCHAR(2))

INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL)
INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL)
INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0')
INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL)
INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0')

;with DistinctTable as (
    select  *, 
    DenseRank = Dense_Rank() over (Partition By A, B, D order by A, B, C, D)
    from    @YourTable
)
select  A, B, C, D
from    DistinctTable
where   DenseRank = 1


回答3:

A subquery perhaps?

SELECT A,B,C,D FROM table1 WHERE EXISTS ( SELECT DISTINCT A,B,D FROM table1 );



回答4:

The fact you have NULLs in A and D compicates matters for any EXISTS.

Any MIN/MAX solution on C may not give you NULL as I think you want. Otherwise, use MIN(C) and a simple group by.

You have to extract the unique keys first (A, B, D), then use that to determine extract the rows again and work out what to do with C

DECLARE @TheTable TABLE (
  A varchar(2) NULL,
  B varchar(2) NULL,
  C varchar(2) NULL,
  D varchar(2) NULL
)

INSERT INTO @TheTable VALUES (NULL, 'd0', 'd0', NULL)
INSERT INTO @TheTable VALUES (NULL, 'd0', 'd1', NULL)
INSERT INTO @TheTable VALUES (NULL, 'd0', 'd2', 'a0')
INSERT INTO @TheTable VALUES ('d0', 'd1', 'd1', NULL)
INSERT INTO @TheTable VALUES ('d0', 'd2', 'd2', 'a0')

SELECT DISTINCT
    T.A,
    T.B,
    CASE Number WHEN 1 THEN T.C ELSE NULL END,
    T.D
FROM
    (SELECT
        COUNT(*) AS Number,
        A, B, D
    FROM
        @TheTable
    GROUP BY
        A, B, D
    ) UQ
    JOIN
    @TheTable T ON ISNULL(T.A, '') = ISNULL(UQ.A, '') AND ISNULL(T.B, '') = ISNULL(UQ.B, '') AND ISNULL(T.D, '') = ISNULL(UQ.D, '')


回答5:

if you have an unique id in the table, then i would go for something like this:

SELECT A,B,C,D FROM table WHERE id IN (SELECT DISTINCT A,B,D)

The problem is that you would always get the first value of C, not the first one with an value.