Select the last value of each column, with a singl

2019-08-05 14:05发布

问题:

Having the following data (blank means NULL):

ID    ColA    ColB    ColC
1     15              20
2     11      4       
3             3

How can I get the last not-NULL values of each column in a single query? So the resulting for the given data would be:

ColA    ColB    ColC
11      3       20

I have not found much, the function that seemed to do something similar to what I describe was COALESCE, but it does not work as expected in my case.

回答1:

Looks like you would have to run a separate query per column with plain SQL. For a small table and only 3 columns, @Guffa's query should be fine.

3 window functions

You can do the same in one query with three window functions: Not sure if this is faster than three individual subqueries:

SELECT first_value(cola) OVER (ORDER BY cola IS NULL, id DESC) AS cola
      ,first_value(colb) OVER (ORDER BY colb IS NULL, id DESC) AS colb
      ,first_value(colc) OVER (ORDER BY colc IS NULL, id DESC) AS colc
FROM   tbl
LIMIT  1;

count() as window function

You can also exploit the fact that count() does not count NULL values.

WITH x AS (
   SELECT CASE WHEN count(cola) OVER w = 1 THEN cola ELSE NULL END AS cola
         ,CASE WHEN count(colb) OVER w = 1 THEN colb ELSE NULL END AS colb
         ,CASE WHEN count(colc) OVER w = 1 THEN colc ELSE NULL END AS colc
   FROM   tbl
   -- WHERE id > x -- safe to ignore a certain portion from a large table?
   WINDOW w AS (ORDER  BY id DESC)
   )
SELECT max(cola) AS cola, max(colb) AS colb, max(colc) AS colc
FROM   x

For bigger tables and more columns, a recursive CTE or a procedural function will be considerably faster:

Recursive CTE

WITH RECURSIVE x AS (
   SELECT cola, colb, colc
         ,row_number() OVER (ORDER  BY id DESC) AS rn
   FROM   tbl
   )

   , y AS (
   SELECT rn, cola, colb, colc
   FROM   x
   WHERE  rn = 1

   UNION ALL
   SELECT x.rn
        , COALESCE(y.cola,x.cola)
        , COALESCE(y.colb,x.colb)
        , COALESCE(y.colc,x.colc)
   FROM   y
   JOIN   x ON  x.rn = y.rn + 1
   WHERE  y.cola IS NULL OR y.colb IS NULL OR y.colc IS NULL
   )
SELECT cola, colb, colc
FROM   y
ORDER  BY rn DESC
LIMIT  1;

PL/pgSQL function

My money is on this one for best performance:

CREATE OR REPLACE FUNCTION f_last_nonull(OUT cola int
                                       , OUT colb int
                                       , OUT colc int) AS
$func$
DECLARE
   r record;
BEGIN

FOR r IN
   SELECT t.cola, t.colb, t.colc 
   FROM   tbl t
   ORDER  BY t.id DESC
LOOP
   IF cola IS NULL AND r.cola IS NOT NULL THEN cola := r.cola; END IF;
   IF colb IS NULL AND r.colb IS NOT NULL THEN colb := r.colb; END IF;
   IF colc IS NULL AND r.colc IS NOT NULL THEN colc := r.colc; END IF;

   EXIT WHEN NOT (cola IS NULL OR colb IS NULL OR colc IS NULL);
END LOOP;

END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_last_nonull();
cola | colb | colc
-----+------+------
 11  | 3    | 20

Test with EXPLAIN ANALYZE. Would be nice if you could come back with a comparison of the solutions.



回答2:

You can use subqueries:

select
  (select ColA from TheTable where ColA is not null order by ID desc limit 1) as ColA,
  (select ColB from TheTable where ColB is not null order by ID desc limit 1) as ColB,
  (select ColC from TheTable where ColC is not null order by ID desc limit 1) as Col


标签: sql null