Translate code string into desc in hive

2019-07-02 00:41发布

问题:

Here we have a hyphened string like 0-1-3.... and the length is not fixed, also we have a DETAIL table in hive to explain the meaning of each code.

DETAIL | code | desc | + ---- + ---- + | 0 | AAA | | 1 | BBB | | 2 | CCC | | 3 | DDD |

Now we need a hive query to convert the code string into a description string.

For example: the case 0-1-3 should get a string like AAA-BBB-DDD.

any advice on how to get that ?

回答1:

Split your string to get an array, explode array and join with detail table (CTE is used in my example instead of it, use normal table instead) to get desc joined with code. Then assemble string using collect_list(desc) to get an array + concat_ws() to get concatenated string:

select concat_ws('-',collect_list(d.desc)) as code_desc 
from
( --initial string explode
select explode(split('0-1-3','-')) as code
) s
inner join 
(-- use your table instead of this subquery
select 0  code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code;

Result:

OK
AAA-BBB-DDD
Time taken: 114.798 seconds, Fetched: 1 row(s)

In case you need to preserve the original order, then use posexplode it returns the element as well as its position in the original array. Then you can order by record ID and pos before collect_list().

If your string is a table column then use lateral view to select exploded values.

This is more complicated example with order preserved and lateral view.

select str as original_string, concat_ws('-',collect_list(s.desc)) as transformed_string
from
(
select s.str, s.pos, d.desc     
from
( --initial string explode with ordering by str and pos
  --(better use your table PK, like ID instead of str for ordering), pos
select str, pos, code from ( --use your table instead of this subquery
                             select '0-1-3' as str union all
                             select '2-1-3' as str union all
                             select '3-2-1' as str
                           )s
lateral view outer posexplode(split(s.str,'-')) v as pos,code
) s
inner join 
(-- use your table instead of this subquery
select 0  code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code
distribute by s.str -- this should be record PK candidate
sort by s.str, s.pos --sort on each reducer
)s
group by str;

Result:

OK
0-1-3   AAA-BBB-DDD
2-1-3   CCC-BBB-DDD
3-2-1   DDD-CCC-BBB
Time taken: 67.534 seconds, Fetched: 3 row(s)

Note that distribute + sort is being used instead of simply order by str, pos. distribute + sort works in fully distributed mode, order by will work also correct but on single reducer.



标签: hive hiveql