Is it possible to do the following in Postgres:
SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer';
SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport;
In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table.
I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each column returned from the information schema query. Eg:
FOR r IN select column_name from information_schema where report_view_name = 'somereport' and data_type = 'integer';
LOOP
SELECT SUM(r.column_name) FROM somereport;
END
This query creates the complete DML statement you are after:
WITH x AS (
SELECT 'public'::text AS _schema -- provide schema name ..
,'somereport'::text AS _tbl -- .. and table name once
)
SELECT 'SELECT ' || string_agg('sum(' || quote_ident(column_name)
|| ') AS sum_' || quote_ident(column_name), ', ')
|| E'\nFROM ' || quote_ident(x._schema) || '.' || quote_ident(x._tbl)
FROM x, information_schema.columns
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
GROUP BY x._schema, x._tbl;
You can execute it separately or wrap this query in a plpgsql function and run the query automatically with EXECUTE
:
Full automation
Tested with PostgreSQL 9.1.4
CREATE OR REPLACE FUNCTION f_get_sums(_schema text, _tbl text)
RETURNS TABLE(names text[], sums bigint[]) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE (
SELECT 'SELECT ''{'
|| string_agg(quote_ident(c.column_name), ', ' ORDER BY c.column_name)
|| '}''::text[],
ARRAY['
|| string_agg('sum(' || quote_ident(c.column_name) || ')'
, ', ' ORDER BY c.column_name)
|| ']
FROM '
|| quote_ident(_schema) || '.' || quote_ident(_tbl)
FROM information_schema.columns c
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
);
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT unnest(names) AS name, unnest (sums) AS col_sum
FROM f_get_sums('public', 'somereport');
Returns:
name | col_sum
---------------+---------
int_col1 | 6614
other_int_col | 8364
third_int_col | 2720642
Explain
The difficulty is to define the RETURN
type for the function, while number and names of columns returned will vary. One detail that helps a little: you only want integer
columns.
I solved this by forming an array of bigint
(sum(int_col)
returns bigint
). In addition I return an array of column names. Both sorted alphabetically by column name.
In the function call I split up these arrays with unnest()
arriving at the handsome format displayed.
The dynamically created and executed query is advanced stuff. Don't get confused by multiple layers of quotes. Basically you have EXECUTE
that takes a text argument containing the SQL query to execute. This text, in turn, is provided by secondary SQL query that builds the query string of the primary query.
If this is too much at once or plpgsql
is rather new for you, start with this related answer where I explain the basics dealing with a much simpler function and provide links to the manual for the major features.
If performance is essential query the Postgres catalog directly (pg_catalog.pg_attributes
) instead of using the standardized (but slow) information_schema.columns
. Here is a simple example with pg_attributes
.