Vector (array) addition in Postgres

2020-07-16 08:24发布

问题:

I have a column with numeric[] values which all have the same size. I'd like to take their element-wise average. By this I mean that the average of

{1, 2, 3}, {-1, -2, -3}, and {3, 3, 3}

should be {1, 1, 1}. Also of interest is how to sum these element-wise, although I expect that any solution for one will be a solution for the other.

(NB: The length of the arrays is fixed within a single table, but may vary between tables. So I need a solution which doesn't assume a certain length.)

My initial guess is that I should be using unnest somehow, since unnest applied to a numeric[] column flattens out all the arrays. So I'd like to think that there's a nice way to use this with some sort of windowing function + group by to pick out the individual components of each array and sum them.

-- EXAMPLE DATA
CREATE TABLE A
  (vector numeric[])
;

INSERT INTO A
  VALUES
    ('{1, 2, 3}'::numeric[])
    ,('{-1, -2, -3}'::numeric[])
    ,('{3, 3, 3}'::numeric[])
;

回答1:

I discovered a solution on my own which is probably the one I will use.

First, we can define a function for adding two vectors:

CREATE OR REPLACE FUNCTION vec_add(arr1 numeric[], arr2 numeric[])
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT tuple.val1 + tuple.val2 AS result
      FROM (SELECT UNNEST($1) AS val1
                   ,UNNEST($2) AS val2
                   ,generate_subscripts($1, 1) AS ix) tuple
      ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;

and a function for multiplying by a constant:

CREATE OR REPLACE FUNCTION vec_mult(arr numeric[], mul numeric)
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT val * $2 AS result
      FROM (SELECT UNNEST($1) AS val
                   ,generate_subscripts($1, 1) as ix) t
      ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;

Then we can use the PostgreSQL statement CREATE AGGREGATE to create the vec_sum function directly:

CREATE AGGREGATE vec_sum(numeric[]) (
    SFUNC = vec_add
    ,STYPE = numeric[]
);

And finally, we can find the average as:

SELECT vec_mult(vec_sum(vector), 1 / count(vector)) FROM A;


回答2:

I've written an extension to do vector addition (and subtraction, multiplication, division, and powers) with fast C functions. You can find it on Github or PGXN.

Given two arrays a and b you can say vec_add(a, b). You can also add either side to a scalar, e.g. vec_add(a, 5).

If you want a SUM aggregate function instead you can find that in aggs_for_vecs, also on PGXN.

Finally if you want to sum up all the elements of a single array, you can use aggs_for_arrays (PGXN).



回答3:

from http://www.postgresql.org/message-id/4C2504A3.4090502@wp.pl

select avg(unnested) from (select unnest(vector) as unnested from A) temp;


Edit: I think I now understand the question better.

Here is a possible solution drawing heavily upon: https://stackoverflow.com/a/8767450/3430807 I don't consider it elegant nor am I sure it will perform well:

Test data:

CREATE TABLE A
  (vector numeric[], id serial)
;

INSERT INTO A
  VALUES
    ('{1, 2, 3}'::numeric[])
    ,('{4, 5, 6}'::numeric[])
    ,('{7, 8, 9}'::numeric[])
;

Query:

select  avg(vector[temp.index])
from    A as a
join
    (select generate_subscripts(vector, 1) as index
              , id
        from    A) as temp on temp.id = a.id
group by temp.index