This is my query:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung;
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
Benutzer 2 | Anwendung 4
(6 rows)
I want to keep one row for each distinct value in the field name
, and discard the others like this:
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
In MySQL I would just do:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung
GROUP BY anwendung.name;
But PostgreSQL gives me this error:
ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...
I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?
PostgreSQL doesn't currently allow ambiguous
GROUP BY
statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in theSELECT
list but not inGROUP BY
.In PostgreSQL, you should use
DISTINCT ON
for this kind of query.You want to write something like:
(Syntax corrected based on follow-up comment)
This is a bit like MySQL 5.7's
ANY_VALUE(...)
pseudo-function forgroup by
, but in reverse - it says that the values in thedistinct on
clause must be unique, and any value is acceptable for the columns not specified.Unless there's an
ORDER BY
, there is no gurantee as to which values are selected. You should usually have anORDER BY
for predictability.It's also been noted that using an aggregate like
min()
ormax()
would work. While this is true - and will lead to reliable and predictable results, unlike usingDISTINCT ON
or an ambigiousGROUP BY
- it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.Craig's answer and your resulting query in the comments share the same flaw: The table
anwendung
is at the right side of aLEFT JOIN
, which contradicts your obvious intent. You care aboutanwendung.name
and pickautor.entwickler
arbitrarily. I'll come back to that further down.It should be:
DISTINCT ON (1)
is just a syntactical shorthand forDISTINCT ON (an.name)
. Positional references are allowed here.If there are multiple developers (
entwickler
) for an app (anwendung
) one developer is picked arbitrarily. You have to add anORDER BY
clause if you want the "first" (alphabetically according to your locale):As @mdahlman implied, a more canonical way would be:
Or, better yet, clean up your data model, implement the n:m relationship between
anwendung
andautor
properly, add surrogate primary keys asanwendung
andautor
are hardly unique, enforce relational integrity with foreign key constraints and adapt your resulting query:The proper way
This query retrieves one row per app with one associated author (the 1st one alphabetically) or NULL if there are none:
Result: