I would like to count duplicate date values. I already know that my field "WHEN" is a timestamp, so I have to cast it to type of date.
my actual query looks like this:
SELECT
u.USERNAME,
r."WHEN",
r.UPDATEINOUT,
case (r.UPDATEINOUT) when 0 then 0 when 1 then 1 else r.INOUT end INOUT
FROM
ATTENDANT r
LEFT JOIN
USERS u ON r.USERID = u.ID
where
u.USERNAME = 'rk' and (r.UPDATEINOUT = 1 or r.UPDATEINOUT = 0 or r.UPDATEINOUT is null)
group by
r."WHEN",
INOUT,
u.USERNAME,
r.UPDATEINOUT
order by
r."WHEN"
And this is the result:
Username WHEN UPDATEINOUT INOUT
rk 09.04.2018, 14:59:45.000 [null] 0
rk 09.04.2018, 14:59:51.000 [null] 1
rk 11.04.2018, 08:31:02.000 [null] 0
rk 11.04.2018, 12:06:52.000 [null] 1
rk 11.04.2018, 12:10:29.000 [null] 0
rk 11.04.2018, 12:23:09.000 [null] 1
rk 11.04.2018, 12:43:47.000 [null] 0
rk 11.04.2018, 17:07:40.000 [null] 1
Now I would like to count the duplicate dates, the result should look like this:
Username WHEN UPDATEINOUT INOUT Count
rk 09.04.2018, 14:59:45.000 [null] 0 2
rk 09.04.2018, 14:59:51.000 [null] 1 2
rk 11.04.2018, 08:31:02.000 [null] 0 6
rk 11.04.2018, 12:06:52.000 [null] 1 6
rk 11.04.2018, 12:10:29.000 [null] 0 6
rk 11.04.2018, 12:23:09.000 [null] 1 6
rk 11.04.2018, 12:43:47.000 [null] 0 6
rk 11.04.2018, 17:07:40.000 [null] 1 6
When I add
count(cast(r."WHEN" as date))
then it shows me only one.
It seems there are answers suggesting "what to do" but not trying to explain why the results are like what they are. What is happening. It seems like the topic starter is being given a fish, not a fishing rod.
This is a bit funny. This line shows novice misconception between natural human language like English, and mathematical sets languages like SQL.
R.Kut reads this his added line as "i want to
count
how many [distinct] values of thegiven expression
are there".But that is not what this command actually means in SQL. In SQL it means "I want to
count
how many rows are there, where thegiven expression
is not null."And so there is actually NO DIFFERENCE between
count(cast(r."WHEN" as date))
andcount(r."WHEN")
- those two parameters-expressions are either bothNULL
or bothNOT NULL
. Hence the value ofcount
of those equally nullable parameters is equal too.The very attempt to truncate a value of the parameter of the aggregate function, like if it could alter nullability, is a misconception. I was there too. It takes time to get used what aggregates over mathematical sets really mean, and that you do not reading English when you are reading SQL.
Frankly, you could do just
count (1)
here removing not only the typecast but the column itself - it would still be the same, as those are rows, not values that are being asked to be counted. Unless there are rows where"WHEN" IS NULL
- that would be accounted for bygroup by
but not bycount
. After you read and think through the next section, come back and toy with http://sqlfiddle.com/#!9/ee09a/7Now, there is another parameter to the
count
function that I kind of mentioned above. It is that "distinct" parameter.Remark: One may say
distinct
is a keyword of the SQL language, not a parameter for a function, but de facto, not de jure, it changes the way the function works, so in my perception it is the parameter, in that unusual SQLish way the parameters are often given to functions. Or, another way to reason about this, it may be seen a part of the function name, if to imagine have we two functions to choose from,count
andcount-distinct
.So, the topic starter could had added
count(distinct cast(r."WHEN" as date))
instead, and.......and see that nothing changed. Because while this time he would really say to the server to count row with not-null (always only not-null!) and
distinct
values - the counting goes within the group.And what those groups are?
See, across the groups we have rows with distinct time-and-date values of "WHEN". And other columns too, but I am not focusing on them. The thing here is that within every group both time and date parts of "WHEN" are kept the same. And "being the same" means "there is one
distinct
value, repeated again and again". And if there is only onedistinct
time-and-date value then the reduced values of only-time or only-date would have the same counts (reducing a value can only make to previously different values equal, but not make previously equal values now different).Granted, in other situations, when the counting is going over columns not included in the group (or when there is not
group by
clause at all), the result may differ. Therecount(distinct ...)
might do what the topic starter is expecting it to do.http://sqlfiddle.com/#!9/0d65bf/7 - example.
However one has to keep in mind:
count
ed columns).group by
clause :-DThe code of the example, linked above:
Here we come to yet another modification to toy with.
Now, this time we are telling the server to assemble the groups, in which only "date" part of "WHEN" is the same, while "time" part might differ.
However....
group by
clause.index
made exactly by that expression, the server would have to do much extra work, probably resorting to natural scan and temporary files grouping, which would make the query heavy and slow..
Because, well, you just cannot have "One True Value" for
WHEN
in the row, if you explicitly asked server to group different values of it.Here you may come to the conclusion, that there is no simple straightforward way to have both groups and across-the-groups aggregates. Or to put it differently, to have two different sets of
group by
in one query (I do not want to mentionUNION
here, mkay?).You would need ONE set of
group by
criteria for counting rows with the same date part (but maybe different time part), and yet ANOTHERgroup by
criteria for picking and rendering groups different by both date-and-time.In plain and simple SQL 1999 that means you would have to make TWO
select
s to make both groupings one way or another, and here Yogesh's and Gordon's answer kick in.Like Gordon mentioned in post-99 SQL there came Window Functions which could let you have those criteria sets in one
select
, but they are not available in Firebird 2.x anyway.You could upgrade to Firebird 3.0 and use window functions.
Alternatively, you can use CTEs
I think you need
subquery
: