Test for null in function with varying parameters

2019-01-11 08:43发布

问题:

I have a Postgres function:

create function myfunction(integer, text, text, text, text, text, text) RETURNS 
table(id int, match text, score int, nr int, nr_extra character varying, info character varying, postcode character varying,
street character varying, place character varying, country character varying, the_geom geometry)
AS $$
BEGIN

return query (select a.id, 'address' as match, 1 as score, a.ad_nr, a.ad_nr_extra,a.ad_info,a.ad_postcode, s.name as street, p.name place , c.name country, a.wkb_geometry as wkb_geometry from "Addresses" a 
    left join "Streets" s on a.street_id = s.id 
        left join "Places" p on s.place_id = p.id 
            left join "Countries" c on p.country_id = c.id 
            where c.name = $7 
                and p.name = $6
                    and s.name = $5
                    and a.ad_nr = $1 
                    and a.ad_nr_extra = $2
                    and a.ad_info = $3
                    and ad_postcode = $4);
END;
$$
LANGUAGE plpgsql;

This function fails to give the right result when one or more of the variables entered are NULL because ad_postcode = NULL will fail.

What can I do to test for NULL inside the query?

回答1:

I disagree with some of the advice in other answers. This can be done with plpgsql and I think it is mostly far superior to assembling queries in a client application. It is faster and cleaner and the app only sends the bare minimum across the wire in requests. SQL statements are saved inside the database, which makes it easier to maintain - unless you want to collect all business logic in the client application, this depends on the general architecture.

General advice

  • You don't need parentheses around the SELECT with RETURN QUERY

  • Never use name and id as column names. They are not descriptive and when you join a bunch of tables (like you have to a lot in a relational database), you end up with several columns all named name or id. Duh.

  • Please format your SQL properly, at least when asking public questions. But do it privately as well, for your own good.

PL/pgSQL function

CREATE OR REPLACE FUNCTION func(
      _ad_nr       int  = NULL
    , _ad_nr_extra text = NULL
    , _ad_info     text = NULL
    , _ad_postcode text = NULL
    , _sname       text = NULL
    , _pname       text = NULL
    , _cname       text = NULL)
  RETURNS TABLE(id int, match text, score int, nr int, nr_extra text
              , info text, postcode text, street text, place text
              , country text, the_geom geometry) AS
$func$
BEGIN

-- RAISE NOTICE '%', -- for debugging
RETURN QUERY EXECUTE concat(
$$SELECT a.id, 'address'::text, 1 AS score, a.ad_nr, a.ad_nr_extra
     , a.ad_info, a.ad_postcode$$

, CASE WHEN (_sname, _pname, _cname) IS NULL THEN ', NULL::text' ELSE ', s.name' END  -- street
, CASE WHEN (_pname, _cname) IS NULL         THEN ', NULL::text' ELSE ', p.name' END  -- place
, CASE WHEN _cname IS NULL                   THEN ', NULL::text' ELSE ', c.name' END  -- country
, ', a.wkb_geometry'

, concat_ws('
JOIN   '
, '
FROM   "Addresses" a'
, CASE WHEN NOT (_sname, _pname, _cname) IS NULL THEN '"Streets"   s ON s.id = a.street_id' END
, CASE WHEN NOT (_pname, _cname) IS NULL         THEN '"Places"    p ON p.id = s.place_id' END
, CASE WHEN _cname IS NOT NULL                   THEN '"Countries" c ON c.id = p.country_id' END
)

, concat_ws('
AND    '
   , '
WHERE  TRUE'
   , CASE WHEN $1 IS NOT NULL THEN 'a.ad_nr = $1' END
   , CASE WHEN $2 IS NOT NULL THEN 'a.ad_nr_extra = $2' END
   , CASE WHEN $3 IS NOT NULL THEN 'a.ad_info = $3' END
   , CASE WHEN $4 IS NOT NULL THEN 'a.ad_postcode = $4' END
   , CASE WHEN $5 IS NOT NULL THEN 's.name = $5' END
   , CASE WHEN $6 IS NOT NULL THEN 'p.name = $6' END
   , CASE WHEN $7 IS NOT NULL THEN 'c.name = $7' END
)
)
USING $1, $2, $3, $4, $5, $6, $7;

END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM func(1, '_ad_nr_extra', '_ad_info', '_ad_postcode', '_sname');

SELECT * FROM func(1, _pname := 'foo');
  • Since I provided default values for all function parameters, you can use positional notation, named notation or mixed notation in the function call. More in this related answer:
    Functions with variable number of input parameters

  • For more explanation on the basics of dynamic SQL, refer to this related answer:
    Refactor a PL/pgSQL function to return the output of various SELECT queries

  • The concat() function is instrumental for building the string. It was introduced with Postgres 9.1.

  • The ELSE branch of a CASE statement defaults to NULL when not present. Simplifies the code.

  • The USING clause for EXECUTE makes SQL injection impossible and allows to use parameter values directly, exactly like prepared statements.

  • NULL values are used to ignore parameters here. They are not actually used to search.

Simple SQL function

You could do it with a plain SQL function and avoid dynamic SQL. For some cases this may be faster. But I wouldn't expect it in this case. Re-planning the query with or without joins and where conditions is the superior approach. It will give you optimized query plans. Planning cost for a simple query like this is almost negligible.

CREATE OR REPLACE FUNCTION func_sql(
     _ad_nr       int  = NULL
   , _ad_nr_extra text = NULL
   , _ad_info     text = NULL
   , _ad_postcode text = NULL
   , _sname       text = NULL
   , _pname       text = NULL
   , _cname       text = NULL)
  RETURNS TABLE(id int, match text, score int, nr int, nr_extra text
              , info text, postcode text, street text, place text
              , country text, the_geom geometry) AS 
$func$

SELECT a.id, 'address' AS match, 1 AS score, a.ad_nr, a.ad_nr_extra
     , a.ad_info, a.ad_postcode
     , s.name AS street, p.name AS place
     , c.name AS country, a.wkb_geometry
FROM   "Addresses"      a
LEFT   JOIN "Streets"   s ON s.id = a.street_id
LEFT   JOIN "Places"    p ON p.id = s.place_id
LEFT   JOIN "Countries" c ON c.id = p.country_id
WHERE ($1 IS NULL OR a.ad_nr = $1)
AND   ($2 IS NULL OR a.ad_nr_extra = $2)
AND   ($3 IS NULL OR a.ad_info = $3)
AND   ($4 IS NULL OR a.ad_postcode = $4)
AND   ($5 IS NULL OR s.name = $5)
AND   ($6 IS NULL OR p.name = $6)
AND   ($7 IS NULL OR c.name = $7)

$func$ LANGUAGE sql;

Identical call.

To effectively ignore parameters with NULL values:

($1 IS NULL OR a.ad_nr = $1)

If you actually want to use NULL values as parameters, use this construct instead:

($1 IS NULL AND a.ad_nr IS NULL OR a.ad_nr = $1)  -- AND binds before OR

This also allows for indexes to be used.
Also replace all instances of LEFT JOIN with JOIN.

SQL Fiddle with simplified demo for all variants.



回答2:

You can use

c.name IS NOT DISTINCT FROM $7

It will return true if c.name and $7 are equal or both are null.

Or you can use

(c.name = $7 or $7 is null )

It will return true if c.name and $7 are equal or $7 is null.



回答3:

If you can modify the query, you could do something like

and (ad_postcode = $4 OR $4 IS NULL)


回答4:

Several things...

First, as side note: the semantics of your query might need a revisit. Some of the stuff in your where clauses might actually belong in your join clauses, like:

from ...
left join ... on ... and ...
left join ... on ... and ...

When they don't, you most should probably be using an inner join, rather than a left join.

Second, there is a is not distinct from operator, which can occasionally be handy in place of =. a is not distinct from b is basically equivalent to a = b or a is null and b is null.

Note, however, that is not distinct from does NOT use an index, whereas = and is null actually do. You could use (field = $i or $i is null) instead in your particular case, and it will yield the optimal plan if you're using the latest version of Postgres:

https://gist.github.com/ddebernardy/5884267