hive how to increment the values specific to condi

2019-07-25 15:51发布

问题:

i have table like below:

itm    numbr
1        2
11       21
NULL     31
12       41
NULL     51
13       61

i want to output with itm,numbr and with extra column as mentioned below:

   itm      numbr    incr
    1        2        1
    11       21       1
    NULL     31       2
    12       41       2
    NULL     51       3
    13       61       3

the incr should increase its value only when ITM is equal to NULL else it should display the same value as previous value.

1. i have tried with row_sequence

select itm,numbr,row_sequence() as incr from tablename;

but it displays 1,2,3,4,5,6.

2. then i have tried with

select itm,numbr,case when itm=NULL then row_sequence()+1 else row_sequence() end as incr from table;

it throws error "FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Stateful expressions cannot be used inside of CASE".

please advice me. tia.

回答1:

Untested but analytical function should help you, as value of the incr column depends on the values of other rows:

WITH tmpincr as (
  SELECT 
      itm
    , nmbr
    -- get in tmpincr 1 for NULL, 0 for not null
    , if(isnull(itm), 1, 0) as tmpincr
   FROM tablename
)
SELECT 
    itm
  , nmbr
  -- get the sum of tmpincr for all rows before current one when ordered by itm
  , SUM(tmpincr) OVER (
      ORDER BY nmbr
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as incr
  FROM tmpincr
  ORDER BY nmbr
;


回答2:

UPDATE table_name
SET    incr = Coalesce(itm, 0) + 1

I haven't tried this solution but this adds 1 if is null otherwise increment number. Modify it to fit your needs =)



标签: sql hadoop hive