hive sql find the latest record-第2页回答

hive sql find the latest record

2019-03-08 19:48发布

the table is:

create table test (
id string,
name string,
age string,
modified string)

data like this:

id    name   age  modifed
1     a      10   2011-11-11 11:11:11
1     a      11   2012-11-11 12:00:00
2     b      20   2012-12-10 10:11:12
2     b      20   2012-12-10 10:11:12
2     b      20   2012-12-12 10:11:12
2     b      20   2012-12-15 10:11:12

I want to get the latest record(include every colums id,name,age,modifed) group by id,as the data above,the correct result is:

1     a      11   2012-11-11 12:00:00
2     b      20   2012-12-15 10:11:12

I do like this:

insert overwrite table t 
select b.id, b.name, b.age, b.modified 
from (
        select id,max(modified) as modified 
        from test 
        group by id
) a 
left outer join test b on (a.id=b.id  and a.modified=b.modified);

This sql can get the right result,but when mass data，it runs slow.

**Is there any way to do this without left outer join? **

标签： sql group-by hive max

8条回答

傲

2楼-- · 2019-03-08 20:32

Give this a try:

select t1.* from test t1
join (
  select id, max(modifed) maxModified from test
  group by id
) s
on t1.id = s.id and t1.modifed = s.maxModified

Fiddle here.

Left outer join solution here.

Let us know which one runs faster :)

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

3楼-- · 2019-03-08 20:35

Just slightly different approach than what has been answered in back previous answer.

Below example uses hive windowing function to find out the latest record, read more here

SELECT t.id
    ,t.name
    ,t.age
    ,t.modified
FROM (
    SELECT id
        ,name
        ,age
        ,modified
        ,ROW_NUMBER() OVER (
            PARTITION BY id ORDER BY unix_timestamp(modified,'yyyy-MM-dd hh:mm:ss') DESC
            ) AS ROW_NUMBER   
    FROM test
    ) t
WHERE t.ROW_NUMBER <= 1;

The modified is string so converting it to timestamp using unix_timestamp(modified,'yyyy-MM-dd hh:mm:ss') then applying order by on timestamp.

0人赞添加讨论(0) 举报

上一页 1 2

hive sql find the latest record

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间