I have the following schema:
CREATE TABLE author (
id integer
, name varchar(255)
);
CREATE TABLE book (
id integer
, author_id integer
, title varchar(255)
, rating integer
);
And I want each author with its last book:
SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id
GROUP BY author.id
ORDER BY book.id ASC
Apparently you can do that in mysql: Join two tables in MySQL, returning just one row from the second table.
But postgres gives this error:
ERROR: column "book.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
How can I specify to postgres: "Give me only the last row, when ordered by joined_table.id
, in the joined table ?"
Edit: With this data:
INSERT INTO author (id, name) VALUES
(1, 'Bob')
, (2, 'David')
, (3, 'John');
INSERT INTO book (id, author_id, title, rating) VALUES
(1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);
I should see:
book_id author_id name last_book
3 1 "Bob" "3rd book from bob"
5 2 "David" "2nd book from David"
I've done something similar for a chat system, where room holds the metadata and list contains the messages. I ended up using the Postgresql LATERAL JOIN which worked like a charm.
For more info see https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral
Here is one way:
This may look archaic and overly simple, but it does not depend on window functions, CTE's and aggregating subqueries. In most cases it is also the fastest.
Check
distinct on
With distinct on it is necessary to include the "distinct" columns in the
order by
. If that is not the order you want then you need to wrap the query and reorderAnother solution is to use a window function as in Lennart's answer. And another very generic one is this
You could add a rule into the join for specifying only one row. I had work for me.
Like this:
This way you get the data from the book with the higher ID. You could add "date" and make the same with the max(date).
As a slight variation on @wildplasser's suggestion, which still works across implementations, you can use max rather than not exists. This reads better if you like short joins better than long where clauses
or, to give a name to the subquery, which clarifies things, go with WITH