Guidance on using the WITH clause in SQL

2019-03-12 16:35发布

问题:

I understand how to use the WITH clause for recursive queries (!!), but I'm having problems understanding its general use / power.

For example the following query updates one record whose id is determined by using a subquery returning the id of the first record by timestamp:

update global.prospect psp
set    status=status||'*'
where  psp.psp_id=(
           select  p2.psp_id
           from    global.prospect p2
           where   p2.status='new' or p2.status='reset'
           order   by p2.request_ts
           limit   1 )
returning psp.*;

Would this be a good candidate for using a WITH wrapper instead of the relatively ugly sub-query? If so, why?

回答1:

If there can be concurrent write access to involved tables, there are race conditions in the above following queries. Consider:

  • Postgres UPDATE … LIMIT 1

Your example can use a CTE (common table expression), but it will give you nothing a subquery couldn't do:

WITH x AS (
   SELECT  psp_id
   FROM    global.prospect
   WHERE   status IN ('new', 'reset')
   ORDER   BY request_ts
   LIMIT   1
   )
UPDATE global.prospect psp
SET    status = status || '*'
FROM   x
WHERE  psp.psp_id = x.psp_id
RETURNING psp.*;

BTW, the returned row will be the updated version.


If you wanted to insert the returned row into another table, that's where a WITH clause becomes essential:

WITH x AS (
   SELECT  psp_id
   FROM    global.prospect
   WHERE   status IN ('new', 'reset')
   ORDER   BY request_ts
   LIMIT   1
   ), y AS (
   UPDATE global.prospect psp
   SET    status = status || '*'
   FROM   x
   WHERE  psp.psp_id = x.psp_id
   RETURNING psp.*
   )
INSERT INTO z
SELECT *
FROM   y

Data modifying queries using CTE are possible with PostgreSQL 9.1 or later.
Read more in the excellent manual.



回答2:

WITH lets you define "temporary tables" for use in a SELECT query. For example, I recently wrote a query like this, to calculate changes between two sets:

-- Let o be the set of old things, and n be the set of new things.
WITH o AS (SELECT * FROM things(OLD)),
     n AS (SELECT * FROM things(NEW))

-- Select both the set of things whose value changed,
-- and the set of things in the old set but not in the new set.
SELECT o.key, n.value
    FROM o
    LEFT JOIN n ON o.key = n.key
    WHERE o.value IS DISTINCT FROM n.value

UNION ALL

-- Select the set of things in the new set but not in the old set.
SELECT n.key, n.value
    FROM o
    RIGHT JOIN n ON o.key = n.key
    WHERE o.key IS NULL;

By defining the "tables" o and n at the top, I was able to avoid repeating the expressions things(OLD) and things(NEW).

Sure, we could probably eliminate the UNION ALL using a FULL JOIN, but I wasn't able to do that in my particular case.


If I understand your query correctly, it does this:

  • Find the oldest row in global.prospect whose status is 'new' or 'reset'.

  • Mark it by adding an asterisk to its status

  • Return the row (including our tweak to status).

I don't think WITH will simplify anything in your case. It may be slightly more elegant to use a FROM clause, though:

update global.prospect psp
set    status = status || '*'
from   ( select psp_id
         from   global.prospect
         where  status = 'new' or status = 'reset'
         order  by request_ts
         limit  1
       ) p2
where  psp.psp_id = p2.psp_id
returning psp.*;

Untested. Let me know if it works.

It's pretty much exactly what you have already, except:

  • This can be easily extended to update multiple rows. In your version, which uses a subquery expression, the query would fail if the subquery were changed to yield multiple rows.

  • I did not alias global.prospect in the subquery, so it's a bit easier to read. Since this uses a FROM clause, you'll get an error if you accidentally reference the table being updated.

  • In your version, the subquery expression is encountered for every single item. Although PostgreSQL should optimize this and only evaluate the expression once, this optimization will go away if you accidentally reference a column in psp or add a volatile expression.