I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates.
Now I want to select, from each group of duplicates, first any of them that have column A
not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID
.
How do I select that? I can't exactly use a non-aggregate in the THEN of a CASE and an aggregate in the ELSE. E.g. this doesn't work:
SELECT CASE
WHEN d.A IS NULL THEN d.ID
ELSE MIN(d.ID) END,
d.B,
d.C,
d.E,
d.F
FROM TABLE T
JOIN (my duplicate query here) D ON T.B=D.B
AND T.C=D.C
AND T.E=D.E
AND T.F=D.F
GROUP BY T.B,
T.C,
T.E,
T.F
Error being:
column A must appear in the GROUP BY clause or be used in an aggregate function.
This can be radically simpler:
Your duplicate query has all columns. No need to
JOIN
to the base table again.Use the Postgres extension of the standard SQL
DISTINCT
:DISTINCT ON
:Postgres has a proper boolean type. You can
ORDER BY
boolean expression directly. The sequence isFALSE
(0),TRUE
(1),NULL
(NULL). If a is NULL, this expression isFALSE
and sorts first:(a IS NOT NULL)
. The rest is ordered byid
. Voilá.Selection of
ID
happens automatically. According to your description you want the ID of the row selected in this query. Nothing more to do.You can probably integrate this into your duplicate query directly.