Oracle SQL for continuous grouping

2019-03-28 16:07发布

问题:

I need to generate a report from a table with the structure and data as given below.

Table Ticket has data as given below.

ID         Assigned_To
100       raju
101       raju
102       raju
103       anil
104       anil
105       sam
106       raju
107       raju
108       anil

The Oracle SELECT should generate the below report

From_Id            To_Id    Assigned_To
100                  102      raju
103                  104      anil
105                  105      sam
106                  107      raju
108                  108      anil

Can someone please help me with building a query..?

Thanks in advance, Mathew.

回答1:

SQL> create table ticket (id,assigned_to)
  2  as
  3  select 100, 'raju' from dual union all
  4  select 101, 'raju' from dual union all
  5  select 102, 'raju' from dual union all
  6  select 103, 'anil' from dual union all
  7  select 104, 'anil' from dual union all
  8  select 105, 'sam'  from dual union all
  9  select 106, 'raju' from dual union all
 10  select 107, 'raju' from dual union all
 11  select 108, 'anil' from dual
 12  /

Tabel is aangemaakt.

SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- -----------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil

5 rijen zijn geselecteerd.

**UPDATE with the results of a performance comparison with tuinstoel's solution:

On 11.1.0.7:

SQL> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on
SQL> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select * from table(testpl.pltest)
  2  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on

And the results:

SQL> exec runstats_pkg.rs_stop(100)
Run1 draaide in 547 hsecs
Run2 draaide in 549 hsecs
Run1 draaide in 99.64% van de tijd

Naam                                                      Run1        Run2    Verschil
STAT.recursive cpu usage                                     2         106         104
LATCH.row cache objects                                     91         217         126
STAT.bytes received via SQL*Net from client             37,496      37,256        -240
STAT.recursive calls                                         7       5,914       5,907
STAT.table scan rows gotten                            615,235     589,824     -25,411
STAT.sorts (rows)                                      917,504     589,824    -327,680

Run1 latches totaal versus run2 -- verschil en percentage
Run1      Run2  Verschil     Pct
10,255    10,471       216  97.94%

PL/SQL procedure successfully completed.

Regards, Rob.



回答2:

Let's say that Andrew from NZSG inspired me. I made a pipe lined function too.

create or replace package testpl is

 type outrec_type is record
 ( from_id ticket.id%type
 , to_id   ticket.id%type
 , assigned_to ticket.assigned_to%type);

 type outrec_table is table of outrec_type;

 function pltest return outrec_table pipelined;

end;
/

create or replace package body testpl is

  function pltest return outrec_table pipelined
  is
    l_outrec outrec_type;
    l_first_time boolean := true;
  begin

     for r_tick in (select id, assigned_to from ticket order by id) loop

       if (r_tick.assigned_to != l_outrec.assigned_to or l_first_time) then
          if not l_first_time then
            pipe row (l_outrec);
          else
            l_first_time := false;
          end if;
          l_outrec.assigned_to := r_tick.assigned_to;
          l_outrec.from_id := r_tick.id;
       end if;
       l_outrec.to_id := r_tick.id;
    end loop;

    pipe row (l_outrec);

    return;
  end;

end;
/

You can test it with:

select * from table(testpl.pltest);

It is approximately twice as fast as Rob van Wijk's solution on my Windows XP Oracle 11.1.0.6.0 system.

The

for r_tick in (select ....) loop
  ....
end loop;

construction has a very decent performance in Oracle 10 and 11. Most of the time SQL only solutions are faster but I think that here PL/SQL is faster.



回答3:

OK, this isn't pretty, but it works. And nobody else has contributed anything prettier yet, so maybe this is the way to do it.

select min(from_id), to_id, assigned_to from
(
select from_id, max(to_id) as to_id, assigned_to from
(
select t1.id as from_id, t2.id as to_id, t1.assigned_to
from       ticket t1
inner join ticket t2 on t1.assigned_to = t2.assigned_to and t2.id >= t1.id
where not exists
    (
    select * from ticket t3
    where  t3.ID > t1.ID
    and t3.ID < t2.ID
    and t3.assigned_to != t1.assigned_to
    )
) x
group by from_id, assigned_to
) y
group by to_id, assigned_to
;

I'm using mysql; there may well be some oracle goodness that makes this nicer - as there may well be some more elegant plain sql. But at least it's a start.



回答4:

You can bend over backwards trying to achieve this in pure SQL or you can create something a bit lengthier but both much easier to understand and more performance efficient - use a pipelined function

In essence the function would accept a ref cursor which would have to be pre-ordered by the ID, and then pipe rows only when a contiguous block of records has ended.

CREATE TABLE assignment
(
  a_id         NUMBER,
  assigned_to  VARCHAR2(4000)
);

CREATE OR REPLACE package PCK_CONTIGUOUS_GROUPBY as

 TYPE refcur_t IS REF CURSOR RETURN assignment%ROWTYPE;

 TYPE outrec_typ IS RECORD ( 
    from_id    NUMBER,
    to_id      NUMBER,
    assigned_to  VARCHAR2(4000));

  TYPE outrecset IS TABLE OF outrec_typ;

 FUNCTION f_cont_groupby(p refcur_t) 
      RETURN outrecset PIPELINED;

end;
/

CREATE OR REPLACE package body pck_contiguous_groupby as

 FUNCTION f_cont_groupby(p refcur_t) RETURN outrecset PIPELINED IS

  out_rec             outrec_typ;
  in_rec              p%ROWTYPE;
  first_id            assignment.a_id%type;
  last_id             assignment.a_id%type;
  last_assigned_to    assignment.assigned_to%type;

  BEGIN

   LOOP
     FETCH p INTO in_rec;
     EXIT WHEN p%NOTFOUND;


       IF last_id IS NULL THEN
       -- First record: don't pipe
         first_id := in_rec.a_id;

       ELSIF last_id = in_rec.a_id - 1 AND last_assigned_to = in_rec.assigned_to THEN
       -- Contiguous block: don't pipe
         NULL;

       ELSE
       -- Block not contiguous: pipe 
         out_rec.from_id := first_id;
         out_rec.to_id := last_id;
         out_rec.assigned_to := last_assigned_to;

         PIPE ROW(out_rec);

         first_id := in_rec.a_id;
       END IF;

     last_id := in_rec.a_id;
     last_assigned_to := in_rec.assigned_to;

   END LOOP;
   CLOSE p;

   -- Pipe remaining row 
   out_rec.from_id := first_id;
   out_rec.to_id := last_id;
   out_rec.assigned_to := last_assigned_to;

   PIPE ROW(out_rec);

   RETURN;
 END;

END pck_contiguous_groupby;
/

and then to try it out, populate the table and run:

SELECT * FROM TABLE(pck_contiguous_groupby.f_cont_groupby (CURSOR (SELECT a_id, assigned_to FROM assignment ORDER BY a_id)));


回答5:

this should work. The solution consist of several inline view - each of them computes something. I commented what I intended to do.You must of course read the comments from the innermost out as it is executed.

--get results by grouping by interval_begin
SELECT MIN(id) from_id,
       MAX(id) to_id,
       MAX(assigned_to) assigned_to
  FROM ( --copy ids of a first row of each interval of ids to the all following rows of that interval
         SELECT id,
                 assigned_to,
                 MAX(change_at) over(ORDER BY id) interval_begin
           FROM ( --find each id where a change of an assignee occurs and "mark" it. Dont forget the first row
                 SELECT id,
                         assigned_to,
                         CASE
                           WHEN (lag(assigned_to) over(ORDER BY id) <> assigned_to OR lag(assigned_to)
                                 over(ORDER BY id) IS NULL) THEN
                            id
                         END change_at
                   FROM ticket))
 GROUP BY interval_begin
 ORDER BY from_id;
    ;


回答6:

I did some benchmarking with Oracle express edition 10.2.0.1.0

I used this script to fill table ticket with 1179648 rows:

create table ticket (id,assigned_to)
as
select 100, 'raju' from dual union all
select 101, 'raju' from dual union all
select 102, 'raju' from dual union all
select 103, 'anil' from dual union all
select 104, 'anil' from dual union all
select 105, 'sam' from dual union all
select 106, 'raju' from dual union all
select 107, 'raju' from dual union all
select 108, 'anil' from dual
/


begin
  for i in 1..17 loop
    insert into ticket 
    select id + (select count(*) from ticket), assigned_to
    from ticket;
  end loop;
end;
/

commit;

SQL> select count(*) from ticket;

  COUNT(*)                                                                      
----------                                                                      
   1179648                                                                      

Rob van Wijk's select statement takes 1.6 seconds on average, Mesays's select statement 2.8 seconds on average, Micheal Pravda's select statement 4.2 seconds and Andrew from NZSG's statement 9.6 seconds on average.

So a pipelined function is slower in Oracle XE. Or maybe someone has to improve the pipelined function...?



回答7:

Here is my suggestion, not very well tested but in my head it sounds right, as long as id is unique and a continious sequence. Look at the bottom for the sql query.

SQL> create table ticket (id number, assigned_to varchar2(30));

Table created.

SQL> insert into ticket values (100, 'raju');

1 row created.

SQL> insert into ticket values (101, 'raju');

1 row created.

SQL> insert into ticket values (102, 'raju');

1 row created.

SQL> insert into ticket values (103, 'anil');

1 row created.

SQL> insert into ticket values (104, 'anil');

1 row created.

SQL> insert into ticket values (105, 'sam');

1 row created.

SQL> insert into ticket values (106, 'raju');

1 row created.

SQL> insert into ticket values (107, 'raju');

1 row created.

SQL> insert into ticket values (108, 'anil');

1 row created.

SQL> select a.id from_id
  2  ,lead(a.id -1, 1, a.id) over (order by a.id) to_id
  3  ,a.assigned_to
  4  from (
  5  select
  6  id, assigned_to
  7  ,lag(assigned_to, 1) over (order by id) prev_assigned_to
  8  from ticket
  9  ) a
 10  where a.assigned_to != nvl(a.prev_assigned_to, a.assigned_to||'unique')
 11  order by id
 12  ;

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- ------------------------------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil


回答8:

I believe you want one of the Oracle Analytic functions. You're in luck if you're using Oracle because other RDBMSs don't have this functionality. They let you write SQL that allows you to query data in relation to adjacent rows, such as calculating moving averages. I don't have an Oracle DB here to play with, but I think it would be something like this:

SELECT MIN(ID) AS From_Id, MAX(ID) AS To_Id, Assigned_To
FROM Ticket
PARTITION BY Assigned_To
ORDER BY From_Id