How do I get the type of an array's elements?

2019-05-07 21:04发布

问题:

I'm writing a polymorphic PL/pgSQL function that iterates over an array. I am interested in using FOREACH, however I cannot figure out how to declare a temporary variable with the right type.

My function is below, for more information see the comment on line 4.

CREATE OR REPLACE FUNCTION uniq(ary anyarray) RETURNS anyarray AS $$
DECLARE
  ret ary%TYPE := '{}';
  v ???; -- how do I get the element type of @ary@?
BEGIN
  IF ary IS NULL THEN
    return NULL;
  END IF;

  FOREACH v IN ARRAY ary LOOP
    IF NOT v = any(ret) THEN
      ret = array_append(ret, v);
    END IF;
  END LOOP;

  RETURN ret;
END;
$$ LANGUAGE plpgsql;

回答1:

Answer to primary question

AFAIK, you cannot declare a variable of a polymorphic type without a "template" variable or parameter.

There are related examples in the manual at the end of the chapter Declaring Function Parameters, but this trick is not covered: add another IN, INOUT or OUT parameter with data type ANYELEMENT to the function definition. It resolves to the matching element type automatically and can be (ab)used as variable inside the function body directly or as template for more variables:

CREATE OR REPLACE FUNCTION uniq1(ary ANYARRAY, v ANYELEMENT = NULL)
  RETURNS anyarray AS
$func$
DECLARE
   ret      ary%TYPE := '{}';
   some_var v%TYPE;  -- we could declare more variables now
                     -- but we don't need to
BEGIN
   IF ary IS NULL THEN
      RETURN NULL;
   END IF;

   FOREACH v IN ARRAY ary LOOP  -- instead, we can use v directly
      IF NOT v = any(ret) THEN
         ret := array_append(ret, v);
      END IF;
   END LOOP;

   RETURN ret;
END
$func$  LANGUAGE plpgsql;

Related:

  • Can I make a plpgsql function return an integer without using a variable?

Copying types like that only works in the DECLARE section and is different type casting. It is explained in the manual here.

Assign a default value, so the added parameter does not have to be included in the function call: ANYELEMENT= NULL

Call (unchanged):

SELECT uniq1('{1,2,1}'::int[]);
SELECT uniq1('{foo,bar,bar}'::text[]);

Better function

I would actually use an OUT parameter for convenience and invert the test logic:

CREATE OR REPLACE FUNCTION uniq2(ary ANYARRAY, elem ANYELEMENT = NULL
                               , OUT ret ANYARRAY)
  RETURNS anyarray AS
$func$
BEGIN
   IF ary IS NULL
      THEN RETURN;
      ELSE ret := '{}';  -- init
   END IF;

   FOREACH elem IN ARRAY ary LOOP
      IF elem = ANY(ret) THEN  -- do nothing
      ELSE
         ret := array_append(ret, elem);
      END IF;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

But this still does not cover all cases containing NULL elements.

Proper function

To work for NULL elements as well:

CREATE OR REPLACE FUNCTION uniq3(ary ANYARRAY, elem ANYELEMENT = NULL
                               , OUT ret ANYARRAY)
  RETURNS anyarray AS
$func$
BEGIN
   IF ary IS NULL
      THEN RETURN;
      ELSE ret := '{}';  -- init
   END IF;

   FOREACH elem IN ARRAY ary LOOP
      IF elem IS NULL THEN  -- special test for NULL
         IF array_length(array_remove(ret, NULL), 1) = array_length(ret, 1) THEN
            ret := array_append(ret, NULL);
         END IF;
      ELSIF elem = ANY(ret) THEN  -- do nothing
      ELSE
         ret := array_append(ret, elem);
      END IF;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

Checking for NULL in an array is a bit of a pain:

  • How to determine if NULL is contained in an array in Postgres?

All of these functions are just proof of concept. I would use neither. Instead:

Superior solutions with plain SQL

In Postgres 9.4 use WITH ORDINALITY to preserve original order of elements. Detailed explanation:

  • PostgreSQL unnest() with element number

Basic code for single value:

SELECT ARRAY (
   SELECT elem
   FROM  (
      SELECT DISTINCT ON (elem) elem, i
      FROM   unnest('{1,2,1,NULL,4,NULL}'::int[]) WITH ORDINALITY u(elem, i)
      ORDER  BY elem, i
      ) sub
   ORDER  BY i) AS uniq;

Returns:

uniq
------------
{1,2,NULL,4}

About DISTINCT ON:

  • Select first row in each GROUP BY group?

Built into a query:

SELECT *
FROM   test t
     , LATERAL (
   SELECT ARRAY (
      SELECT elem
      FROM  (
         SELECT DISTINCT ON (elem) elem, i
         FROM   unnest(t.arr) WITH ORDINALITY u(elem, i)
         ORDER  BY elem, i
         ) sub
      ORDER BY i) AS arr
   ) a;

This has a tiny corner case: it returns an empty array a NULL array. To cover all bases:

SELECT t.*, CASE WHEN t.arr IS NULL THEN NULL ELSE a.arr END AS arr
FROM   test t
     , LATERAL (
   SELECT ARRAY (
      SELECT elem
      FROM  (
         SELECT DISTINCT ON (elem) elem, ord
         FROM   unnest(t.arr) WITH ORDINALITY u(elem, ord)
         ORDER  BY elem, ord
         ) sub
      ORDER BY ord) AS arr
   ) a;

Or:

SELECT *
FROM   test t
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT elem
      FROM  (
         SELECT DISTINCT ON (elem) elem, i
         FROM   unnest(t.arr) WITH ORDINALITY u(elem, i)
         ORDER  BY elem, i
         ) sub
      ORDER BY i) AS arr
   ) a ON t.arr IS NOT NULL;

In Postgres 9.3 or older you can substitute with generate_subscripts():

SELECT *
FROM   test t
     , LATERAL (
   SELECT ARRAY (
      SELECT elem
      FROM  (
         SELECT DISTINCT ON (t.arr[i]) t.arr[i] AS elem, i
         FROM   generate_subscripts(t.arr, 1) i
         ORDER  BY t.arr[i], i
         ) sub
      ORDER  BY i
      ) AS arr
   ) a;

We need this in sqlfiddle, which currently only supports pg 9.3, so WITH ORDINALITY is not available:

SQL Fiddle.



回答2:

I do not know how to declare a variable of base type of an anyarray argument (the documentation makes no mention of such a possibility).

You can use FOR LOOP with integer variable instead:

CREATE OR REPLACE FUNCTION uniq(ary anyarray) RETURNS anyarray AS $$
DECLARE
  ret ary%TYPE := '{}';
  i int;
BEGIN
  IF ary IS NULL THEN
    return NULL;
  END IF;

  FOR i IN array_lower(ary, 1) .. array_upper(ary, 1) LOOP
    IF NOT ary[i] = any(ret) THEN
      ret = array_append(ret, ary[i]);
    END IF;
  END LOOP;

  RETURN ret;
END;
$$ LANGUAGE plpgsql;

However, loops and variables may not be necessary:

create or replace function uniq_without_loop(arr anyarray)
returns anyarray language plpgsql as $$
begin
    return (
        select array_agg(distinct elem)
        from unnest(arr) elem);
end $$;

The version of the above function that leaves the array order unchanged:

create or replace function unsorted_uniq_without_loop(arr anyarray)
returns anyarray language plpgsql as $$
begin
    return (
        select array_agg(elem)
        from (
            select elem 
            from (
                select distinct on(elem) elem, row_number() over ()
                from unnest(array[arr]) elem
                ) sub
            order by row_number
            ) sub);
end $$;