SQL two criteria from one group-by

2019-07-25 23:54发布

问题:

I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates.

Now I want to select, from each group of duplicates, first any of them that have column A not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID.

How do I select that? I can't exactly use a non-aggregate in the THEN of a CASE and an aggregate in the ELSE. E.g. this doesn't work:

SELECT CASE
           WHEN d.A IS NULL THEN d.ID
           ELSE MIN(d.ID) END,
       d.B,
       d.C,
       d.E,
       d.F
FROM TABLE T
JOIN (my duplicate query here) D ON T.B=D.B
AND T.C=D.C
AND T.E=D.E
AND T.F=D.F
GROUP BY T.B,
         T.C,
         T.E,
         T.F

Error being:

column A must appear in the GROUP BY clause or be used in an aggregate function.

回答1:

This can be radically simpler:

SELECT DISTINCT ON (b, c, e, f)
       b, c, e, f, id   -- add more columns freely
FROM   (<duplicate query here>) sub
ORDER  BY b, c, e, f, (a IS NOT NULL), id
  • Your duplicate query has all columns. No need to JOIN to the base table again.

  • Use the Postgres extension of the standard SQL DISTINCT: DISTINCT ON:

    • Select first row in each GROUP BY group?
  • Postgres has a proper boolean type. You can ORDER BY boolean expression directly. The sequence is FALSE (0), TRUE (1), NULL (NULL). If a is NULL, this expression is FALSE and sorts first: (a IS NOT NULL). The rest is ordered by id. Voilá.

  • Selection of ID happens automatically. According to your description you want the ID of the row selected in this query. Nothing more to do.

  • You can probably integrate this into your duplicate query directly.