the table is:
create table test (
id string,
name string,
age string,
modified string)
data like this:
id name age modifed
1 a 10 2011-11-11 11:11:11
1 a 11 2012-11-11 12:00:00
2 b 20 2012-12-10 10:11:12
2 b 20 2012-12-10 10:11:12
2 b 20 2012-12-12 10:11:12
2 b 20 2012-12-15 10:11:12
I want to get the latest record(include every colums id,name,age,modifed) group by id,as the data above,the correct result is:
1 a 11 2012-11-11 12:00:00
2 b 20 2012-12-15 10:11:12
I do like this:
insert overwrite table t
select b.id, b.name, b.age, b.modified
from (
select id,max(modified) as modified
from test
group by id
) a
left outer join test b on (a.id=b.id and a.modified=b.modified);
This sql can get the right result,but when mass data,it runs slow.
**Is there any way to do this without left outer join? **
Give this a try:
Fiddle here.
Left outer join solution here.
Let us know which one runs faster :)
Just slightly different approach than what has been answered in back previous answer.
Below example uses hive windowing function to find out the latest record, read more here
The modified is string so converting it to timestamp using
unix_timestamp(modified,'yyyy-MM-dd hh:mm:ss')
then applying order by on timestamp.