Searching through all columns in a table

2019-09-15 04:41发布

问题:

I have a table with 300+ columns, many of these columns have no data in them. is there a query I can use to find out the names of these columns so I can remove them for the table. Also I am using a postgresql database on a redshift server if that matters

回答1:

First you get the field names

SELECT *
FROM information_schema.columns
WHERE table_schema = 'your_schema'
  AND table_name   = 'your_schema'

Then using a loop you create a dinamic query

SELECT count(*)
FROM 'your_schema'.'your_schema'
WHERE `yourfield` IS NOT NULL
HAVING count(*) = 0

Using a cursor with dynamic SQL in a stored procedure



回答2:

You can try to use jsonb type and functions.

Lets say that your table declared as

create table t as (x int, y varchar, z numeric);

First of all lets convert table's rows to jsonb. It is simple:

select to_jsonb(t.*) from t;

Result (for test data)

         to_jsonb          
--------------------------
 {"x":1,"y":"a","z":null}
 {"x":2,"y":"b","z":null}

Next we will convert those result to rows like (key,value) using another json function:

select jsonb_each(to_jsonb(t.*)) from t;

Result:

  jsonb_each  
-------------
 (x,1)
 (y,"""a""")
 (z,null)
 (x,2)
 (y,"""b""")
 (z,null)

It is almost what we need. Next step:

select (w).key, (w).value from (select jsonb_each(to_jsonb(t.*)) as w from t) tt;

Result

 key | value 
-----+-------
 x   | 1
 y   | "a"
 z   | null
 x   | 2
 y   | "b"
 z   | null

Here we use (w) to specify that it is field instead of table.

And the last big step:

select 
  (w).key 
from 
  (select jsonb_each(to_jsonb(t.*)) as w from t) tt 
group by 
  (w).key 
having 
  count(*) filter (where((w).value != 'null')) = 0;

Result

 key 
-----
 z

Try to use the last query just replacing t to your table name.


Upd:

Also you can try use PostgreSQL statistics info:

analyse yourtable;

select
  pg_class.relname,
  pg_attribute.attname,
  pg_statistic.stanullfrac
from
  pg_class join
    pg_statistic on (pg_class.oid = pg_statistic.starelid) join
      pg_attribute on (pg_class.oid = pg_attribute.attrelid and pg_statistic.staattnum = pg_attribute.attnum)
where
  pg_class.relname = 'yourtable';

In the stanullfrac column you will see the relative amount of nulls for each table's column where 1 means all nuls (but I am not sure how it accurate)