Oracle: how to “group by” over a range?-第2页回答

If I have a table like this:

I can "group by" to get a count of each age.

select age,count(*) n from tbl group by age;
age  n
---  -
  5  1
  8  1
 12  2
 22  1

What query can I use to group by age ranges?

  age  n
-----  -
 1-10  2
11-20  2
20+    1

I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.

标签： sql oracle plsql oracle10g

10条回答

看我几分像从前

2楼-- · 2020-02-02 07:05

If using Oracle 9i+, you might be able to use the NTILE analytic function:

WITH tiles AS (
  SELECT t.age,
         NTILE(3) OVER (ORDER BY t.age) AS tile
    FROM TABLE t)
  SELECT MIN(t.age) AS min_age,
         MAX(t.age) AS max_age,
         COUNT(t.tile) As n
    FROM tiles t
GROUP BY t.tile

The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4) will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:

  SELECT CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END AS age, 
         COUNT(*) AS n
    FROM TABLE t
GROUP BY CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2020-02-02 07:07

What you are looking for, is basically the data for a histogram.

You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.

In the simplest form, one could simply count the number of each distinct age value like you already described:

SELECT age, count(*)
FROM tbl
GROUP BY age

When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.

We can avoid writing a WHEN ... THEN line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.

Now let's build the SQL...

First, we need to split the ages into range-groups of 10 like so:

0-9
10-19
20 - 29
etc.

This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:

FLOOR(age/10)

"FLOOR returns the largest integer equal to or less than n" http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643

Then we take the original SQL and replace age with that expression:

SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)

This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.

To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:

FLOOR(age/10) * 10

We also need the upper bound of each range which is lower bound + 10 - 1 or

FLOOR(age/10) * 10 + 10 - 1

Finally, we concatenate both into a string like this:

TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)

This creates '0-9', '10-19', '20-29' etc.

Now our SQL looks like this:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)

Finally, apply an order and nice column aliases:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)

However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering. Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

Credits to @MatthewFlaschen for his approach; I only explained the details.

0人赞添加讨论(0) 举报

Lonely孤独者°

4楼-- · 2020-02-02 07:08

I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:

select extract(hour from transaction_time) as hour
      ,count(*)
from   table
where  transaction_date='01-jan-2000'
group by
       extract(hour from transaction_time)
order by
       extract(hour from transaction_time) asc
;

Giving output:

HOUR COUNT(*)
---- --------
   1     9199 
   2     9167 
   3     9997 
   4     7218

As you can see this gives a nice easy way of grouping the number of records per hour.

0人赞添加讨论(0) 举报

Juvenile、少年°

5楼-- · 2020-02-02 07:13

SELECT CASE 
         WHEN age <= 10 THEN '1-10' 
         WHEN age <= 20 THEN '11-20' 
         ELSE '21+' 
       END AS age, 
       COUNT(*) AS n
FROM age
GROUP BY CASE 
           WHEN age <= 10 THEN '1-10' 
           WHEN age <= 20 THEN '11-20' 
           ELSE '21+' 
         END

0人赞添加讨论(0) 举报

上一页 1 2

Oracle: how to “group by” over a range?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间