I have my query like this:
Select
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy
p.test
max(b.this)
sum(b.sugar)
sum(b.bucket)
sum(b.something)
followed by some outer join and inner join. Now the problem is when in group by
group by
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy,
p.test
The query works fine. But if I remove any one of them from group by it gives:
SQLSTATE: 42803
Can anyone explain the cause of this error?
Generally, any column that isn't in the
group by
section can only be included in theselect
section if it has an aggregating function applied to it. Or, another way, any non-aggregated data in theselect
section must be grouped on.Otherewise, how do you know what you want done with it. For example, if you group on
a.abc
, there can only be one thing thata.abc
can be for that grouped row (since all other values ofa.abc
will come out in a different row). Here's a short example, with a table containing:With the query
select LastName, Salary from Employees group by LastName
, you would expect to see:The salary for the Smiths is incalculable since you don't know what function to apply to it, which is what's causing that error. In other words, the DBMS doesn't know what to do with
123456
and111111
to get a single value for the grouped row.If you instead used
select LastName, sum(Salary) from Employees group by LastName
(ormax()
ormin()
orave()
or any other aggregating function), the DBMS would know what to do. Forsum()
, it will simply add them and give you234567
.In your query, the equivalent of trying to use
Salary
without an aggregating function is to changesum(b.this)
to justb.this
but not include it in thegroup by
section. Or alternatively, remove one of thegroup by
columns without changing it to an aggregation in theselect
section.In both cases, you'll have one row that has multiple possible values for the column.
The DB2 docs at publib for sqlstate 42803 describe your problem:
SQL will insist that any column in the SELECT section is either included in the GROUP BY section or has an aggregate function applied to it in the SELECT section.
This article gives a nice explanation of why this is the case. The article is sql server specific but the principle should be roughly similar for all RDBMS