I have a table with 300+ columns, many of these columns have no data in them. is there a query I can use to find out the names of these columns so I can remove them for the table. Also I am using a postgresql database on a redshift server if that matters
问题:
回答1:
First you get the field names
SELECT *
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_schema'
Then using a loop you create a dinamic query
SELECT count(*)
FROM 'your_schema'.'your_schema'
WHERE `yourfield` IS NOT NULL
HAVING count(*) = 0
Using a cursor with dynamic SQL in a stored procedure
回答2:
You can try to use jsonb type and functions.
Lets say that your table declared as
create table t as (x int, y varchar, z numeric);
First of all lets convert table's rows to jsonb. It is simple:
select to_jsonb(t.*) from t;
Result (for test data)
to_jsonb
--------------------------
{"x":1,"y":"a","z":null}
{"x":2,"y":"b","z":null}
Next we will convert those result to rows like (key,value) using another json function:
select jsonb_each(to_jsonb(t.*)) from t;
Result:
jsonb_each
-------------
(x,1)
(y,"""a""")
(z,null)
(x,2)
(y,"""b""")
(z,null)
It is almost what we need. Next step:
select (w).key, (w).value from (select jsonb_each(to_jsonb(t.*)) as w from t) tt;
Result
key | value
-----+-------
x | 1
y | "a"
z | null
x | 2
y | "b"
z | null
Here we use (w)
to specify that it is field instead of table.
And the last big step:
select
(w).key
from
(select jsonb_each(to_jsonb(t.*)) as w from t) tt
group by
(w).key
having
count(*) filter (where((w).value != 'null')) = 0;
Result
key
-----
z
Try to use the last query just replacing t
to your table name.
Upd:
Also you can try use PostgreSQL statistics info:
analyse yourtable;
select
pg_class.relname,
pg_attribute.attname,
pg_statistic.stanullfrac
from
pg_class join
pg_statistic on (pg_class.oid = pg_statistic.starelid) join
pg_attribute on (pg_class.oid = pg_attribute.attrelid and pg_statistic.staattnum = pg_attribute.attnum)
where
pg_class.relname = 'yourtable';
In the stanullfrac
column you will see the relative amount of nulls for each table's column where 1 means all nuls (but I am not sure how it accurate)