Removing All but the first and last values by grou

2019-05-27 10:36发布

We have a chat system that generates multiple event logs per second sometimes for every event during a chat. The issue is that these consume a massive amount of data storage (which is very expensive on that platform) and we'd like to streamline what we actually store and delete things that really aren't necessary.

To that end, there's an event type for what position in the queue the chat is. We don't care about each position as long as they are not intervening events for that chat. So we want to keep only the first and last in each distinct group where there were no other event types to just get "total time in queue" for that period.

To complicate this, a customer can go in and out of queue as they get transferred by department, so the SAME CHAT can have multiple blocks of these queue position records. I've tried using FIRST_VALUE and LAST_VALUE and it gets me most of the way there, but fails when we have the case of two distinct blocks of these events.

Here's the script to generate the test data:

<!-- language: lang-sql -->
    CREATE TABLE #testdata (
    id varchar(18),
    name varchar(8),
    [type] varchar(20),
    livechattranscriptid varchar(18),
    groupid varchar(40)) 

INSERT INTO #testdata (id,name,[type],livechattranscriptid,groupid) VALUES 
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')

And here is the attempt to identify anything that was the first and last id for it's group ordered by the name field and grouped by the transcriptid:

select *,FIRST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc) as firstinstancegroup,
    LAST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as lastinstancegroup from #testdata order by livechattranscriptid,name

The issue is, it gives me the same first and last id for ALL of them by that entire group rather than treating each group of Enqueue records as a distinct group. How would I treat each distinct grouping instance of Enqueue as a unique group?

1条回答
欢心
2楼-- · 2019-05-27 10:43

Here's a similar solution Grouping contiguous table data

not pretty but you will find the logic based from the OP. contiguous data over the same column

declare @mytable table (
    id varchar(18),
    name varchar(8),
    [type] varchar(20),
    livechattranscriptid varchar(18),
    groupid varchar(100)) 

INSERT INTO @mytable (id,name,[type],livechattranscriptid,groupid) VALUES 
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')


;with myend as (   --- get all ends
 select 
 *
  from 
 (select 
   iif(groupid <> lead(groupid,1,groupid) over (order by name),
     id,
     'x') [newid],name
 from @mytable
 )x 
 where newid <> 'x'
 )
 , mystart as   -- get all starts
 (
 select 
  *
    from 
 (select 
   iif(groupid <> lag(groupid,1,groupid) over (order by name),
     id,
     'x') [newid], name,type,livechattranscriptid
 from @mytable
 )x 
 where newid <> 'x'
 )  ,
 finalstart as (   --- get all starts including the first row

  select id, 
    name,type,livechattranscriptid,
    row_number() over (order by name) rn
    from (
    select id,name,type,livechattranscriptid 
    from (
    select top 1 id, name,type,livechattranscriptid
    from @mytable
    order by name) x
    union all
    select newid,name,type,livechattranscriptid from mystart
    ) y

 ),
 finalend as   -- get all ends and add the last row
   (

  select id, 
    row_number() over (order by name) rn
    from (
    select id,name from (
    select top 1 id,name
    from @mytable
    order by name desc) x
    union all
    select newid,name from myend
    ) y
  )
select 
  s.id [startid]
  ,s.name
  ,s.type
  ,s.livechattranscriptid
  ,e.id [lastid]
   from    
  finalend e
  inner join finalstart s 
     on   e.rn = s.rn    --- bind the two results over the positions or row number
查看更多
登录 后发表回答