How do you find the row count for all your tables

2019-01-02 16:26发布

I'm looking for a way to find the row count for all my tables in Postgres. I know I can do this one table at a time with:

SELECT count(*) FROM table_name;

but I'd like to see the row count for all the tables and then order by that to get an idea of how big all my tables are.

11条回答
冷夜・残月
2楼-- · 2019-01-02 16:53

I like Daniel Vérité's answer. But when you can't use a CREATE statement you can either use a bash solution or, if you're a windows user, a powershell one:

# You don't need this if you have pgpass.conf
$env:PGPASSWORD = "userpass"

# Get table list
$tables = & 'C:\Program Files\PostgreSQL\9.4\bin\psql.exe' -U user -w -d dbname -At -c "select table_name from information_schema.tables where table_type='BASE TABLE' AND table_schema='schema1'"

foreach ($table in $tables) {
    & 'C:\path_to_postresql\bin\psql.exe' -U root -w -d dbname -At -c "select '$table', count(*) from $table"
}
查看更多
与风俱净
3楼-- · 2019-01-02 16:55

I usually don't rely on statistics, especially in PostgreSQL.

SELECT table_name, dsql2('select count(*) from '||table_name) as rownum
FROM information_schema.tables
WHERE table_type='BASE TABLE'
    AND table_schema='livescreen'
ORDER BY 2 DESC;
CREATE OR REPLACE FUNCTION dsql2(i_text text)
  RETURNS int AS
$BODY$
Declare
  v_val int;
BEGIN
  execute i_text into v_val;
  return v_val;
END; 
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
查看更多
孤独总比滥情好
4楼-- · 2019-01-02 16:57

There's three ways to get this sort of count, each with their own tradeoffs.

If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.

The second approach notes that the statistics collector tracks roughly how many rows are "live" (not deleted or obsoleted by later updates) at any time. This value can be off by a bit under heavy activity, but is generally a good estimate:

SELECT schemaname,relname,n_live_tup 
  FROM pg_stat_user_tables 
  ORDER BY n_live_tup DESC;

That can also show you how many rows are dead, which is itself an interesting number to monitor.

The third way is to note that the system ANALYZE command, which is executed by the autovacuum process regularly as of PostgreSQL 8.3 to update table statistics, also computes a row estimate. You can grab that one like this:

SELECT 
  nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;

Which of these queries is better to use is hard to say. Normally I make that decision based on whether there's more useful information I also want to use inside of pg_class or inside of pg_stat_user_tables. For basic counting purposes just to see how big things are in general, either should be accurate enough.

查看更多
回忆,回不去的记忆
5楼-- · 2019-01-02 16:57

To get estimates, see Greg Smith's answer.

To get exact counts, the other answers so far are plagued with some issues, some of them serious (see below). Here's a version that's hopefully better:

CREATE FUNCTION rowcount_all(schema_name text default 'public')
  RETURNS table(table_name text, cnt bigint) as
$$
declare
 table_name text;
begin
  for table_name in SELECT c.relname FROM pg_class c
    JOIN pg_namespace s ON (c.relnamespace=s.oid)
    WHERE c.relkind = 'r' AND s.nspname=schema_name
  LOOP
    RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I',
       table_name, schema_name, table_name);
  END LOOP;
end
$$ language plpgsql;

It takes a schema name as parameter, or public if no parameter is given.

To work with a specific list of schemas or a list coming from a query without modifying the function, it can be called from within a query like this:

WITH rc(schema_name,tbl) AS (
  select s.n,rowcount_all(s.n) from (values ('schema1'),('schema2')) as s(n)
)
SELECT schema_name,(tbl).* FROM rc;

This produces a 3-columns output with the schema, the table and the rows count.

Now here are some issues in the other answers that this function avoids:

  • Table and schema names shouldn't be injected into executable SQL without being quoted, either with quote_ident or with the more modern format() function with its %I format string. Otherwise some malicious person may name their table tablename;DROP TABLE other_table which is perfectly valid as a table name.

  • Even without the SQL injection and funny characters problems, table name may exist in variants differing by case. If a table is named ABCD and another one abcd, the SELECT count(*) FROM... must use a quoted name otherwise it will skip ABCD and count abcd twice. The %I of format does this automatically.

  • information_schema.tables lists custom composite types in addition to tables, even when table_type is 'BASE TABLE' (!). As a consequence, we can't iterate oninformation_schema.tables, otherwise we risk having select count(*) from name_of_composite_type and that would fail. OTOH pg_class where relkind='r' should always work fine.

  • The type of COUNT() is bigint, not int. Tables with more than 2.15 billion rows may exist (running a count(*) on them is a bad idea, though).

  • A permanent type need not to be created for a function to return a resultset with several columns. RETURNS TABLE(definition...) is a better alternative.

查看更多
明月照影归
6楼-- · 2019-01-02 16:58

I made a small variation to include all tables, also for non-public tables.

CREATE TYPE table_count AS (table_schema TEXT,table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT table_schema,table_name
        FROM information_schema.tables
        where table_schema !=''pg_catalog''
          and table_schema !=''information_schema''
        ORDER BY 1,2
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.table_schema||''.''||t_name.table_name
            LOOP 
            END LOOP; 

            r.table_schema := t_name.table_schema;
            r.table_name := t_name.table_name; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 

use select count_em_all(); to call it.

Hope you find this usefull. Paul

查看更多
人间绝色
7楼-- · 2019-01-02 17:01

I don't remember the URL from where I collected this. But hope this should help you:

CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT 
            c.relname
        FROM
            pg_catalog.pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE 
            c.relkind = ''r''
            AND n.nspname = ''public'' 
        ORDER BY 1 
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname 
            LOOP 
            END LOOP; 

            r.table_name := t_name.relname; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 

Executing select count_em_all(); should get you row count of all your tables.

查看更多
登录 后发表回答