The query:
SELECT COUNT(*) as count_all,
posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id;
Returns n
records in Postgresql:
count_all | post_id
-----------+---------
1 | 6
3 | 4
3 | 5
3 | 1
1 | 9
1 | 10
(6 rows)
I just want to retrieve the number of records returned: 6
.
I used a subquery to achieve what I want, but this doesn't seem optimum:
SELECT COUNT(*) FROM (
SELECT COUNT(*) as count_all, posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
) as x;
How would I get the number of records in this context right in PostgreSQL?
I think you just need
COUNT(DISTINCT post_id) FROM votes
.See "4.2.7. Aggregate Expressions" section in http://www.postgresql.org/docs/current/static/sql-expressions.html.
EDIT: Corrected my careless mistake per Erwin's comment.
There is also
EXISTS
:Which, in PostgreSQL and with multiple entries on the
n
-side like you undoubtedly have, is generally faster thancount(DISTINCT x)
:The more rows per post there are in
votes
, the bigger the difference in performance. Just try it withEXPLAIN ANALYZE
.count(DISTINCT x)
will collect all rows, sort or hash them, and then only consider the first per identical set.EXISTS
will only scanvotes
(or, preferably, an index onpost_id
) until the first match is found.If every
post_
id is guaranteed to be present in the tableposts
(for instance, by foreign key constraint), this short form is equivalent to the longer form:This may actually be faster than the first query with
EXISTS
, with no or few entries per post.The query you had works in simpler form, too:
Benchmark
To verify my claims I ran a benchmark on my test server with limited resources. All in a separate schema:
Test setup
Fake a typical post / vote situation:
All of the following queries returned the same result (8093 of 9107 posts had votes).
I ran 4 tests with
EXPLAIN ANALYZE
(best of five) on Postgres 9.1.4 with each of the three queries and appended the resulting total runtimes.As is.
After ..
After ..
After ..
count(*) ... WHERE EXISTS
count(DISTINCT x)
- long form with joincount(DISTINCT x)
- short form without joinBest time for original query in question:
For simplified version:
@wildplasser's query with a CTE uses the same plan as the long form (index scan on posts, index scan on votes, merge join) plus a little overhead for the CTE. Best time:
Index-only scans in the upcoming PostgreSQL 9.2 can change the result for each of these queries.
Related, more detailed benchmark for Postgres 9.5 (actually retrieving distinct rows, not just counting):
Using
OVER()
andLIMIT 1
: