If I have a table like this:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
I can "group by" to get a count of each age.
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
What query can I use to group by age ranges?
age n
----- -
1-10 2
11-20 2
20+ 1
I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.
If using Oracle 9i+, you might be able to use the
NTILE
analytic function:The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows,
NTILE(4)
will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:What you are looking for, is basically the data for a histogram.
You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.
In the simplest form, one could simply count the number of each distinct age value like you already described:
When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.
We can avoid writing a
WHEN ... THEN
line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.Now let's build the SQL...
First, we need to split the ages into range-groups of 10 like so:
This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:
"FLOOR returns the largest integer equal to or less than n" http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
Then we take the original SQL and replace age with that expression:
This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are
0, 1, 2 ... n
.To get the actual lower bound, we need to multiply it with 10 again so we get
0, 10, 20 ... n
:We also need the upper bound of each range which is lower bound + 10 - 1 or
Finally, we concatenate both into a string like this:
This creates
'0-9', '10-19', '20-29'
etc.Now our SQL looks like this:
Finally, apply an order and nice column aliases:
However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering. Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
Credits to @MatthewFlaschen for his approach; I only explained the details.
I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:
Giving output:
As you can see this gives a nice easy way of grouping the number of records per hour.