How to select the rows in original order in Hive?

2020-03-30 02:58发布

问题:

I want to select rows from mytable in original rows with definite numbers. As we know, the key word 'limit' will randomly select rows. The rows in mytable are in order. I just want to select them in their original order. For example, to select the 10000 rows which means from row 1 to row 10000. How to realize this? Thanks.

回答1:

Try:

SET mapred.reduce.tasks = 1
SELECT * FROM ( 
    SELECT *, ROW_NUMBER() OVER () AS row_num
    FROM table ) table1 
SORT BY row_num LIMIT 10000


回答2:

Rows in your table may be in order but... Tables are being read in parallel, results returned from different mappers or reducers not in original order. That is why you should know the rule defining "original order". If you know then you can use row_number() or order by. For example:

select * from table order by ... limit 10000;



标签: hive