I'm trying to submit a query in Postgres that only returns distinct tuples. In my sample query, I do not want duplicate entries where an entry exists multiple times for a cluster_id/feed_id combination. If I do a simple:
select distinct on (cluster_info.cluster_id, feed_id)
cluster_info.cluster_id, num_docs, feed_id, url_time
from url_info
join cluster_info on (cluster_info.cluster_id = url_info.cluster_id)
where feed_id in (select pot_seeder from potentials)
and num_docs > 5 and url_time > '2012-04-16';
I get just that, but I'd also like to group according to num_docs
. So, when I do the following:
select distinct on (cluster_info.cluster_id, feed_id)
cluster_info.cluster_id, num_docs, feed_id, url_time
from url_info join cluster_info
on (cluster_info.cluster_id = url_info.cluster_id)
where feed_id in (select pot_seeder from potentials)
and num_docs > 5 and url_time > '2012-04-16'
order by num_docs desc;
I get the following error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id, feed_id) cluste...
I think I understand why I'm getting the error (cannot group by tuples unless I explicitly describe the group somehow) but how do I do that? Or if I am incorrect in my interpretation of the error, is there a way to accomplish my initial goal?