I have a column with numeric[]
values which all have the same size. I'd like to take their element-wise average. By this I mean that the average of
{1, 2, 3}, {-1, -2, -3}, and {3, 3, 3}
should be {1, 1, 1}
. Also of interest is how to sum these element-wise, although I expect that any solution for one will be a solution for the other.
(NB: The length of the arrays is fixed within a single table, but may vary between tables. So I need a solution which doesn't assume a certain length.)
My initial guess is that I should be using unnest
somehow, since unnest
applied to a numeric[]
column flattens out all the arrays. So I'd like to think that there's a nice way to use this with some sort of windowing function + group by
to pick out the individual components of each array and sum them.
-- EXAMPLE DATA
CREATE TABLE A
(vector numeric[])
;
INSERT INTO A
VALUES
('{1, 2, 3}'::numeric[])
,('{-1, -2, -3}'::numeric[])
,('{3, 3, 3}'::numeric[])
;
I've written an extension to do vector addition (and subtraction, multiplication, division, and powers) with fast C functions. You can find it on Github or PGXN.
Given two arrays
a
andb
you can sayvec_add(a, b)
. You can also add either side to a scalar, e.g.vec_add(a, 5)
.If you want a
SUM
aggregate function instead you can find that in aggs_for_vecs, also on PGXN.Finally if you want to sum up all the elements of a single array, you can use aggs_for_arrays (PGXN).
I discovered a solution on my own which is probably the one I will use.
First, we can define a function for adding two vectors:
and a function for multiplying by a constant:
Then we can use the PostgreSQL statement
CREATE AGGREGATE
to create thevec_sum
function directly:And finally, we can find the average as:
from http://www.postgresql.org/message-id/4C2504A3.4090502@wp.pl
select avg(unnested) from (select unnest(vector) as unnested from A) temp;
Edit: I think I now understand the question better.
Here is a possible solution drawing heavily upon: https://stackoverflow.com/a/8767450/3430807 I don't consider it elegant nor am I sure it will perform well:
Test data:
Query: